- I am delighted to welcome
professor Josh Tenenbaum who's a computational
cognitive scientist at MIT in the Brain and Cognitive
Sciences Department here at Yale. Josh is an amazing and brilliant thinker. And you'll find that out, if you don't already know that. Many of us already know that, but you'll find that out as we
have him here today to talk. And I guess what I'll, I mean, the thing I wanna
say about Josh is that, his mind is amazing,
he's a wonderful person, he's a preeminent conversational
cognitive scientist in, I think, the world, and his genius was recognized in 2019 by the MacArthur Foundation. For those of us that, as
I said, already know him, we didn't need that to
recognize he's genius. But not much else I can say, just that we're very happy
that you can visit us today, give the talk and spend
time with us afterwards. And you are gonna tell us about what, I can't remember What,
- [Josh] Kind. - What kind of computation is cognition? - [Josh] Yeah, it's a
question deliberately. So, I hope we'll have the discussion. - What kind of computation is cognition? - Thank you.
- All right. Great, thank you so much for having me, for Laurie and Brian for inviting me. It's great honor and pleasure to be here. My understanding is, this is one of the first in-person seminars that people have had here in a while. It's also the first in-person seminar that I've given in quite a long time. So, maybe in my enthusiasm and excitement, I try to pack in too much in this talk. So, I will talk about some thoughts on what kind of computation is cognition? And a lot of this is really designed to raise questions for discussion that then I hope we'll maybe
have a few minutes for, with the larger group and also
with the class afterwards. Okay, so, there's a few
questions in this talk. That is the main one. But some questions in the
background are these ones, what would constitute a meaningful
answer to this question? And how do we know if it's right
or even on the right track? In the spirit of this seminar, which is an interdisciplinary meeting between cognitive scientists and people interested metaphysics, I wanted to take this opportunity
to raise these questions, which are essentially questions of metaphysics and epistemology about how do we model the mind. I think there are really
deep issues here to discuss. I'm gonna present one approach. I'm not really gonna talk
about these questions. I'm gonna present one approach, which I call the reverse
engineering approach. And then I'd be very
interested to talk with you about its pros and cons, what
makes sense, what doesn't and what are some other approaches, okay. I also think there's
intrigued in deep parallels between the way we think
about answering this question and the way we think about
actually modeling the mind that reflects some of
the special relationships between cognitive science
and just more generally, philosophy of science and how we think about what makes meaningful
scientific questions and models. But that will just be in the background and maybe we can discuss afterwards. So, what I mean by reverse engineering is trying to characterize
how the human mind works in the same terms that we would use to engineer an intelligent machine, okay. But instead of doing
artificial intelligence, we'll be doing, let's call it natural intelligence, all right. But our goal is to build models that look like AI
systems in a sense, okay. To me, what a meaningful
answer to this question is, I would like to understand how the mind works in those terms. And the way we know that
we're on the right track 'cause I think none of these models that we're gonna talk about are right, but hopefully they're, in
some ways, on the right track is, you know, the same thing
we normally do in science, we compare our models to data. That means we try to capture qualitatively how human behavior works. We want our models to behave like people. But also, quantitatively,
the models often have a lot of moving parts, which I can only begin to
gesture at here for you. But in order to test those
in some rigorous way, it often helps to have quantitative data. So, I want to give you a
feel for how we do that. But most distinctively I think, our goal is not just to capture the data from laboratory experiments, which are great and the bread
and butter, the gold standard, but they're limited in terms
of the scope of what we can do. Our goal is also, to
build models that actually solve the problems that people do. So, in my work and in a lot
of the work I'll talk about, we do things that basically look like AI. They are AI in some form, we implement our systems in robots, or we have them solve AI problems, the same kind of problems that humans face in the real world. And it's essential, we think, to know we're on the right track, that the models both give a qualitative and hopefully quantitative
account of human behavior, but they also solve the problem
that humans have to solve, which gives us some reason to think that they might be describing
the actual computations in the mind and maybe
even the brain, okay. But those are some of the hard problems. What does it really mean to describe the actual computations
in the mind of the brain that I think will only
have time to discuss in the following session, all right. Now, to motivate this kind of approach, I think it's helpful to reflect on the current state of
artificial intelligence or AI. Whether you are in cognitive
science or metaphysics or any number of other fields, including any of the
humanities, you can't ignore AI. It's everywhere around us for
better and worse, I think. But I think it's also important to reflect on the state of what we mean by AI. The way I like to put it is to say, we have all these AI
technologies increasingly useful and perhaps, or not perhaps, actually increasingly dangerous, but increasingly powerful technologies, but we don't have any real
AI, okay, at this effectively. What I mean by that is
we have these systems that do things we used to
think only humans could do and now machines can
do them to some extent, but we don't have anything like the flexible general purpose
kind of common sense, the general notion of intelligence that each and every one of you use to do every one of these
things for yourself without having to be programmed by some dedicated team of engineers at a big tech company
or a hot startup, okay. You just do these things. And we don't have any machine that can just do all
these things for itself. That was the original vision of the founders of the field, right. What's sometimes called a
general artificial intelligence. One of the most interesting
things about cognitive science and the study of human cognition
is how might the mind work that we can do all these things and so many other things for ourselves. Now, reflecting on that gap, I think, helps to motivate again
what are possible ways to answer the question, of what kind of computation is cognition and what I think are some
of the more promising ones. I wanna just dive a little bit deeper into one, very useful but
very imperfect AI technology, and that's self-driving cars
as they're sometimes called or autonomous driving. This is an article which I believe is the first major press
article on self-driving cars, particularly on the Google effort which is now something called Waymo as part of the Alphabet
corporate enterprise. I'm sure most people are aware of this. There are now many, many
car companies working on, and companies of various
sorts, AI companies all sorts, working on trying to build
cars that can drive themselves. And it's probably believe the closest that we've come to having actual robots with some autonomous algorithms that don't have a human in
the loop, except when they do, but basically don't
have a human in the loop some of the time. And they're out there interacting with people in real world settings that really have major consequences, including life and death
consequences, okay. So, where are we in this enterprise? Well, this article was published in the New York Times in 2010 and just sort of announcing
these first efforts that Google was working on. It's now, well, 11 and a half years later, by some estimates $100 billion, $100 billion have been invested in the enterprise of
building self-driving cars, and we're still not there. This is a recent article from
just a few months ago in 2021 about "The Costly Pursuit
of Self-Driving Cars Going On And On And On." And oh, what sometimes called
the long tail of problems. All the cases that you
didn't really anticipate, that somebody didn't have data to learn from in their
machine learning system, that only come up as
you make some progress, and then you realize, nope, there's so much more and
more and more problems that we just haven't
solved, now 12 years later. Waymo again, which is the
Google or Alphabet company, well by this recent
article in Businessweek, "Is 99% of the Way There, But
The Last 1% is the Hardest." I don't think we're 99% of the way there. I think that they've been
saying that for a while and I don't think they
would even say that. The weird echo thing for me. It's not just Waymo, but a number of companies that have been promising robotaxis. "Where are all the robotaxis
we were promised? Well." Yeah, they're not here yet. It turns out that we need some
fundamental advances in AI. This is a actually quite a good article in terms of diagnosing, I think, what some of the things that are missing. And I'm, not gonna go into it, but it's gonna parallel
what I'm talking about here. Or another recent article
from the Wall Street Journal, "Self-Driving Cars Could
Even be Decades Away, No Matter What Elon Musk Says," because AI will have to get a lot smarter. So again, I don't wanna
knock this technology. It's an amazing achievement, okay. That works at all. And in some sense, it makes a lot of sense that it doesn't fully work yet and it's hard to even
know when it will work. What's going on? Well, the technology,
there's lots of things that go on in autonomous driving. But there's a basic thing that's been driving a lot of the progress, which also drives most
of these other progress in AI technologies, which is a certain way
to answer the question of what kind of computation
cognition might be. And you could say it's basically, oh, - Just wanna turn your speaker off. - Oh, sorry. - Turn my speaker off.
- Yes. - There's no audio? Okay. There we go. Good. - Yeah. Okay. - Okay. Perfect.
- Thank you. - It's a certain kind of
machine learning approach, which you could call, it's often called deep learning and end to end neural networks. But if you don't know
what neural networks are, I'm sure probably most of
you do know what they are. And I mean the artificial
kind of neural network that is an AI technology. These are functional mappings. Functions that map from inputs to outputs. The inputs are usually
some kind of sense data, like the sensors of the car
and the outputs are behavior. It might be conceptual behavior like recognizing a pedestrian or another car or making a decision, okay. And some complex function that is implemented in the machine, has many, many parameters
that can be trained. They're called neural networks because each little bit of that function, what's called a unit in a neural network, of which there could be
millions or billions, it's often hard to keep track of how many there are
in some of these models, is sort of an abstraction of a neuron or a nerve cell in the cortex shown here from the famous early studies of, of Hubel and Wiesel and early
mathematical abstractions. What was sometimes called the perceptron, going back to the early 1960s, this work, that then over a series of
decades turned into increasingly bigger models that represented bigger, more scalable attempts to basically build these large trainable functional mappings. So for example, in the work
of Yann LeCun in the 1980s, this is what was one of the early deep convolutional neural
networks that would again, take in for example, a letter or a digit and classify it as
giving it a label A or 3. A little more than 10 years ago, there was a big breakthrough
of progress where people, moment of progress where people started applying these to data sets of natural images and
scenes in computer vision, like the famous ImageNet dataset. And this is the famous
AlexNet Architecture. And you can see that going from
a single nerve like element that takes inputs and
weights and combines them with a linear sum and then
a threshold as neurons do to one layer, multi-layer, many, many, many layered networks. At this point, these systems get deployed in all sorts of ways,
including inside, for example, this is inside the Tesla AI system, not the current state of the art, but a couple of generations ago. But basically, the same kind of technology used to detect all these
things out there on the roads. This is a slide from some work
of one of my MIT colleagues, Jim DiCarlo and his collaborator,
Dan Yamins and others, where they've take these models, which were originally abstracted from, not just how neurons in the brain work, but neurons in visual cortex, the part of the brain that actually is the front end to object recognition. And then, they've gone back, and many others have done this, but their work is especially well known, and taken the same kind of models that have been used in computer vision and shown that they can be
used to make predictions, pretty good quantitative
about the behaviors of neurons in responses to images. So, it's a compelling story in that these models have been used
for practical applications. And though, you know, this is, I'm not focusing here
on like human behavior, but these models can, to some extent, capture some aspects of human behavior. They can also capture
the behavior of neurons. And yet, they're also
missing a huge amount, okay. And this is the gap that
I really wanna focus on that motivates the work that we're doing. Well, some part of intelligence is probably about training functions to approximate some
classifier that can recognize a pattern like a pedestrian in an image, so much of intelligence goes beyond that. All these activities which, why sort of lump under modeling the world. So, not just finding patterns and pixels, but being able to actually explain and understand what we see or to be able to imagine
things that we could see, maybe things that nobody's ever seen, and then think about what it would be like if the world had those things in them. Maybe set those as goals that we could make plans to achieve, and then solve the problems
that come up along the way.. This is intelligence, right. And then, learning is actually
building these models. So, that might mean adjusting or refining, a preexisting model, or constructing whole new models, maybe by combining pieces of other models, or actually, maybe even
synthesizing a model from scratch in some ways. It's all these activities that I think, many cognitive scientists
are interested in. And in our work, we're trying to capture these things in computational terms. So the question for me here is, what computations can or, and in a formal way, and in a
quantitatively testable way, can express these human
capacities for modeling the world and not just finding
patterns in data, right. Now, we're far from being
able to do these things at the scale that Silicon
valley is ready to invest in. Maybe they would, if they knew more about what we were doing, but it's much earlier in the, let's call it the
scaling curve, all right. What we've really been
focused on in our work over the last few years is kind of, the earliest most basic sense
of how we model the world, which I like to call
the common sense core. It's heavily influenced
by a program of research that sometimes referred
to as core knowledge in human cognitive development. Liz Spelke, who is one
of the leading sort of, developers of that whole research program, probably the world's
leading expert in that, she's gonna be one of your speakers, actually the last speaker. So, you'll see a lot of connection between what I'm talking
about today and her work. She's a friend of and collaborator also. And I think when I talk about
the common sense core here, it's also deeply connected
to the basic topics in metaphysics that I think
the class is studying. It's this idea that Spelke and many others have developed
in developmental psychology, as well as people studying the human brain and some amount of comparative work with other non-human species. That, from the very beginning, in the earliest in youngest
infants that we can study, even say three month old babies, and in ways that are to some extent, but not completely probably, shared with the brains of
other non-human species, we are built to understand the world, not just in terms of
patterns and pixel let's say, but in terms of basic kinds of concepts. So, we often talk about
an intuitive physics and an intuitive psychology or objects, actual physical things in the world, like this thing I'm picking up, which even if you'd never seen a cell phone or an iPhone before, you know there's a thing
I'm picking up, right. And you know that if I
were to let go of it here, that it would fall. Maybe there would be a sound.
Maybe there would be so worse. I'm not gonna do that. But okay, that's an
example of an understanding of physical objects and some
of the intuitive physics that humans have from
even at a very early age. We talk about intuitive psychology, and there we mean agents, that could be humans or
could be other animals, or maybe a robot someday,
or a self-driving car, but an entity that has goals and some model of the world itself and acts in some way to achieve its goals, given its model, okay. And the idea that we understand the world in terms of physical objects and goal driven intentional agents and their causal interactions is the heart of our common sense. And what I wanna talk about is how we can try to capture
that set of concepts, those cognitive systems
in computational terms. I'm not gonna talk too much
about the brain side of this, but there'll be maybe a
little bit depending on time, or we can talk about this. So I'm mostly gonna be focusing
on the cognitive aspect. But it's very interesting
that these systems also seem to be correspond to
some of the large scale architecture of the brain, okay. And happy to talk about that and what the metaphysics of that might be telling us as well. The work here is also, I mean, for people who are
familiar with the different sort of, subfields of cognitive science, one of the reasons why I like it is it's really at the intersection of the different
traditional subdisciplines perception, language, action planning. This is the basic stuff that
brings all those together. The targets of perception, the substrate of action planning and also the substrate
of what we talk about, at least at the very beginning, when we're first learning language, right. We human adults have the ability to do what I'm doing right
now and what you're doing, which is communicate, talk and understand about things that are not part of our basic understanding
of objects like this and agents that pursue
goals in the world, okay. We can talk about talking about that, talk about modeling that as
well as quantum mechanics and the origin of the universe
and the origin of life and metaphysics and so many other topics. But in some sense, this
is where it starts, okay. So to get a little bit more concrete, by intuitive physics in
young children, I mean, the kind of thing that say,
this baby is doing here, this one and a half year old, stacking up cups or
playing with their toys. Now, we have robots that can
do things like pick up objects and do simple kinds of
stacking operations. But we don't have any robot that can do what this kid is doing, including the physical dexterity, the ability to sort of
solve the little problems that come up along the way when you're trying to achieve your goal. A little bit of debugging going on here. But even just to make the plan, which you can see that
this kid is doing, right. Even if you haven't seen this video before you can see the stack of three cups, you can kind of guess
where this is gonna go if I jump forward into the video, right. To conceive of the goal, to make the plan. You know if we had robots
that could do this, it would be quite amazing, all right. By intuitive psychology, I'll show one very famous
video from a very famous study by Felix Warneken and Michael Tomasello. This is also with a one
and a half year old. But here the one and a half year old is this little kid in the back who's a participant in the experiment. So, he is watching an adult, that's Felix, do something that he's
never really seen before. I mean, this is a not crazy action, but it's a little bit unfamiliar if you haven't seen this video. And now, watch what happens
when the adult stops moving, the kid goes over, opens the door, yeah. (audience laughing)
Yeah. So you guys are totally with me 'cause you smiled and laughed and even awed all the best parts, right? Somehow that kid is able to figure out what this guy's doing for an action that is not quite like
anything he's seen before and then even figure out
how to help him, right. And you can see that helping goes on when he goes over, opens the door, and in some sense, like the best part, both emotionally and
I think scientifically is when he steps back and looks up and makes eye contact, right, at him, and then kind of looks down at his hands because what you're seeing
when you see that right, is at least a sign suggestive
that the kid has understood what the adult is trying to do and is maybe almost even signaling, like, I think I've
understood what you've done. If, so I think I know
what you're gonna do, which is now do something
with your hands, okay. So again, we don't have robots that can do anything like that. But imagine if we did, they would be very
helpful around the house as well as other situations. And it's not my goal to build
those robots, particularly, although I work with colleagues who do, but the goal for me is
to try to understand in those kinds of engineering terms, what's going on inside that kid's head. Now, I'm gonna show one more video, which I know some of you have seen. It's one of the most famous
videos in cognitive science. And I think it's not
unreasonable to call this the most important two
minutes in cognitive science. Brian might have other things like maybe one of his
experiments, but (laughing) this is the famous
Heider and Simmel movie. I'm not even gonna show all one and a half or two minutes of it, but
just a brief or excerpt. But again, I'm sure many
of you have seen this, but it really is like, if
you wanted one short video that makes the questions of metaphysics and
cognitive science, you know, compelling as well as the
computational challenges, this would be it. This is from a study in the 1940s in which people were just shown
this video of some shapes, just moving around on a tabletop. And yet what they see is much more than just simple shapes
moving two dimensions, right. You see what looks like an
interaction between agents. One that is also sitting
on top of physics. One that is maybe a little
bit not super positive, right, at this point. Maybe there would be some
scary music if right now, if this was a silent film. Yeah. Yes. What's gonna happen? All right. So there's a little bit more,
but we'll stop there, yeah. So, it ends at least some happily for several of the characters, okay. So what's going on in here, right? I mean again, you can describe
this in geometric terms, but what we actually see
is all these other things. We see physical objects
as well as constraints like what is solid, not penetrable, what's fixed, what's movable, that when the door locks you see it sort of get suddenly attached. You see events and causal
interactions like collisions. One thing hits another and it moves. but also it's pushing and shoving. And that gives you an understanding in terms of like agents might
have goals towards each other, trying to hurt or help each other when one's trying to
escape or trap another one. You see relationships
like you see those two as being kind of on the same side, they're friends or something, the other guys kind of an enemy. You experience, or you know,
maybe your own emotions, but you experience them
experiencing emotions, right, and probably make moral judgments as well. So, you know, in a sense,
this the full challenge of trying to understand common sense in computational terms is,
could we build an algorithm that could look at a video like this, or any movie of any groups
of people interacting and make sense of it in all these terms. Now, again, we're far from
that, but that's the goal, that's kind of what we're aiming towards. And so, what I'm gonna talk about today, primarily, are how we can
re represent these things computationally and a little bit about how we study them quantitatively. And then, you know, at least
gesture at these questions of how they might be
instantiated in the brain, which maybe gives a different perspective on what it might mean for the
mind to really work this way, if it's circuitry in some sense, implements these computations, and then say a little
bit, maybe about learning, both how we learn, let's
say our intuitive physics and also how learning can
take us beyond these things. So the key computational ideas here, and I'm just gonna try
to be mindful of the time and make sure that I get to
at least enough of the content so we can have a reasonable discussion. And there's no clocks, so I'm gonna just keep
referring my watch here. But if I lose track of it, just, you know, let me know it's time
to stop basically, okay. So, you know, to answer the question, what kind of computation is cognition? It's not just neural networks for pattern recognition
and function approximation. Well, here's one proposal for some of the other things we need. In our work, we often use the phrase probabilistic programs or
probabilistic programming, which is kind of one of these jargon terms like neural networks, okay. It has something to do with
probabilities and programming, just like neural networks has
something to do with neurons. But what it really refers
to is a whole sort of, you call it a computational paradigm. There's math, there's
programming languages, there's there's systems and platforms that I know some of you're familiar with, some of you even use in your research. But you can think of
it as ways of realizing in practical computational systems, a synthesis of several good
ideas or several paradigms or broad ways of thinking about cognition and computational terms. One is the neural network
pattern recognition idea. Another is what is the oldest
and arguably most important way to think about intelligence
in computational terms, which is the idea of symbol manipulation, or having symbolic languages for abstract knowledge
representation and reasoning. In the early days of AI, as well as in cognitive science, that's what everybody thought about. And as these fields
have had their successes and their failures and gone up and down, often, the so-called
symbolic approach to AI has gotten kind of a bad rep, basically. People have said, "Well, early
promises of AI didn't work because at everybody thought
we should be using symbols." And then we realized we
had to use neural networks. But that is let's call it fake news, okay. That's the nice word for it. Because, you know, if you had to nominate one of these ideas to be
the most important one, that's the most important. If we didn't have symbolic languages, we wouldn't have, you
know, natural language, we wouldn't have mathematics, we wouldn't have all of computing, we wouldn't have pro
programming languages, whether it's Lisp or C or Python or modern programming
languages for deep learning, like TensorFlow or PyTorch, we wouldn't have anything basically. So, that's an absolutely central idea. And we need ways of thinking
about common sense knowledge that integrates learning
and symbols, okay. And then a third idea, which is the one, I guess my work has been
most associated with and is one, that when I was in grad school and in earlier times of my career, was kind of inarguably the dominant idea. Each of these ideas have had their moments of in the sun as well as in the shade. But the idea of probabilistic inference or Bayesian inference
by which we mean often, the Bayesian part of problematic inference is kind of inverting
conditional probabilities, specifically in a causal
setting where we have models of how some things are
caused by other things, let's say effects are caused by underlying latent causes in the world. We observe the effects
and wanna work backwards to make good guesses about the
things that cause them, okay. And that idea is absolutely central, we think, in cognitive
science for understanding whether it's perception or learning, or how we make sense of sparse, uncertain, incomplete, otherwise, and otherwise ambiguous patterns of data, which whether that we're
talking about perception or language or learning,
you're always in that setting. And probabilistic programs and the discipline of
probabilistic programming and probabilistic programming languages, basically let us bring
these ideas together to define models of, well, to define scientific models, which are models of the
models inside your head for these common sense intuitive physics or intuitive psychology
using symbolic languages that support causal models
and then probabilistic or Bayesian inference so
that we can, in a sense, like run these programs backwards, to infer the underlying
things in the world, from the effects that we observe, or the effects that those
things in the world, cause on our senses, all right. And to use neural networks or
other machine learning tools to amplify and extend what we can do with probabilistic inference
over these symbolic programs. If you want to learn
more about this approach beyond what I can talk about in just high level terms in this talk, I would encourage you to
check out any of these various probabilistic
programming languages or this web book, probmods.org, which was written by Noah Goodman and a number of other colleagues. Noah is a Professor at Stanford. We've worked together for a long time. I co-wrote the first
draft of this with Noah, but it's gone through many
iterations since then. A lot of other people have contributed. And Noah's been the one
mostly carrying it forward. But it's a nice introduction
to probabilistic programming and cognitive models based on
probabilistic programs, okay. There's another important idea, another kind of computational tool for capturing common sense, which is this idea that I
described with the slogan, "The game engine in your head." And this is the idea that
tools from game engines, which again, I think probably
many of you are familiar with, some of you probably even programing them. These are tools that were developed in the video game industry
to allow somebody or a team to create a new video
game much more easily than they would otherwise, in particular, often to create games that have some rich immersive experience for a player in some, you
know, three dimensional world, maybe it's outer space or
under sea or the wild west or dinosaurs or something
that has never existed except in the mind of the game designers, but could exist in some possible world. But to create a rich, immersive,
interactive experience without having to write
everything from scratch. So, without having to like
write all of computer graphics, but still make the world look really good, often, nearly photorealistic and respond to what the player does. So, as the player moves around the world or the images on the screen
change in the appropriate way so, you feel like you're
moving through this world or as you interact with the
world, it reacts accordingly. Like, so if I pick up an object
and drop it, it has to fall. If it's glass or something
fragile, it might break, okay. So, these game engines on often
have like graphics engines and physics engines, as well as what's sometimes called
AI engines or game AI, by which they mean tools for simulating. It's not really AI, but
it's tools for simulating like the non-player
characters in the game. So they react in some
vaguely intelligent way rather than just a robotic
way or like for example, a guard at a base might instead of just like shooting randomly into the air when the
player is trying to invade. They might like see you and go after you and you have to like hide so they don't see you for example, okay. Here's an illustration
of a game physics engine. We use these a lot in our work. They, in some sense, just
implement physics, right, like whether it's Newtonian me
or other kinds of, you know, fluid mechanics, soft body cloth physics. But they do it in a way that is designed not to actually be what a
physicist wants in a sense, maybe, but to just look good, look good enough, to be very general, to be able to model all these different kinds of things, including simple systems of
a few balls bouncing around and much more complex systems like hundreds or thousands of blocks, like in this wrecking
ball case and other very kind of complex non rigid
or non solid materials. And to be able to look good on, look reasonable in what
they do going forward, just, you know, on the scale of maybe one or a couple of seconds, all right. And to be able to run in real time, so it can be interactive with a player or maybe even faster than real time, okay. Now these tools have
increasingly been used in many areas of AI as a training ground like for an AI algorithm. So, there might be like a typical thing that somebody might do in a
machine learning approach to AI is to have a reinforcement
learning agent that like, that is deployed into
one of these simulators and then does some things and learn some input output mapping,
some policy as it's called, a mapping from sense data to behavior, And then maybe, you'll
deploy it in the real world. And it has what's called
a Sim-to-real problem. It has to figure out how to go from its training ground in the simulator to act okay in the real world. That's a lot of what is
going on in self-driving cars is the Sim-to-real challenge. And the fact that like, people are trying to create in the simulator, like every possible
thing that could happen in the real world and
that not possible, okay. But the reason why we call this "The game engine in the head. is because the idea is
that the game engine is a model of the model
inside the head, okay. Not the training grounds
for a learning algorithm, but a model of the mental
models that when say, a kid like this one here, sees this stack of blocks
and the toy B bird on top and has the ball in his hand and imagines, well, what would happen if
I roll the ball forward? Oh, maybe that would happen. Maybe it would knock over the blocks. Maybe the bird would fall. I'm not sure. But imagining what might happen, then allows me to decide
if that's what I wanna do, or maybe it's not what I wanna do, or maybe I should do something
differently and so on, okay. So, the kind of models that we build are, we take these kinds of simulation programs for say, physics or analogous
ones effectively for agents, we wrap them inside frameworks
for probabilistic inference so that we can, for example, sorry that the zoom here is cutting off the bottom of the slides, but so for example, we
can build the models, like what I'm sketching here. These are sketches of the models we build, not of the physical world, but of the model inside the
kid's head for a physical world or the model inside that kid's head for what somebody is doing, okay. So, models that might take an
image or a sequence of images and make an inference
to what's the underlying state of the world and it's dynamics that I could then predict forward future, what I might see at future times, maybe conditioned on my
own physical actions. Or this kind of thing here, which is not meant to be a
picture of how your brain works, but how your brain thinks brains work or what we sometimes call
theory of mind, right. It's a standard sort of picture of an intuitive model of agent's minds, where agents have some
kind of goals or desires, they have some beliefs
about the state of the world and their state in it, which is a function of
their perception system, they make plans to come up with actions that are reasonable ways, efficient ways, to achieve their desires,
given their beliefs and then they exert actions on the world, which change the state of the world, or change their own state
and go on and so on. And so this is just a way that we are used to often talking about how our minds work. That is, the sort of standard
model in cognitive science of how even a young child might understand somebody else's actions. And what might go on in a scene like that is observing actions, given
also your observations of the world state and the agent state to try to work backwards
and fill in these things, to try to infer the
underlying mental states or beliefs and desires of an agent in order to make sense of their behavior. So our job here is to take
these sketches and turn them into working quantitatively
testable computational models. So for example, in intuitive physics, this is how we've done this. And this is work that in our lab, we started doing together with Pete Battaglia and Jess Hamrick. And they're the names on these slides. And there's a lot of other work, especially Kevin Smith, Tomar Oman did key things in this research program. But I'm just gonna tell you
about one or two studies here that go back to the earlier
work of Battaglia and Hamrick. I should say also, some key work was done by one of our colleagues
here, Ilker Yildirim, who used to be in our lab and is now an Associate
Professor in Psychology and other fields here at Yale, and continues to do really exciting work that relates to intuitive
physics and perception. So, if you're interested in this topic, Ilker's one of the best people
in the world who does it. And so, I highly recommend you
check out some of his work. But here's like one of the
very first models we built, like more than 10 years ago at this point. The idea is, we're giving people, sorry, I skipped over this, we give people these like
scenes of these blocks, they're kind of simulations
based on the game, Jenga, if you're probably familiar with that, but they're colored in this way. And these different stacks of blocks, some of them will look to
you like they are stable, others will look like they
should be falling over. I, we might have to make a judgment of, will it fall over or not? Or you can make a graded judgment, like on a scale of one to seven, can do it either fast or slow. It actually doesn't make
that much difference in our experiments. Actually, in Brian's
lab with Chaz Firestone, they've done sort of versions of this where people respond extremely quickly, you have to get a brief presentation. And you get pretty similar results whether people are making very
fast perceptual judgements or slower, more considered judgements. People are pretty good at this problem of being able to judge
whether a stack of blocks is gonna fall over or as I'll show you, even to judge in a graded
way how unstable it is. What does it mean to be pretty
good is actually something that I'm gonna try to
be more precise about. But the way we capture what's
going on here is we say, you observe the image and somehow you have to work backwards
to the underlying world state which is the three dimensional configuration of these blocks, their geometry and whatever
is enough about the physics, like mass, friction, the parameters, the basic parameters of physics that the physics engine needs
for its simulation, okay. And we think of perception, basically perception is the
inverse arrow to this one. So, given an image or
a sequence of images, we wanna work backwards. This is sometimes called inverse graphics to figure out what was the
input to the graphics program. That is the thing that
would render the image from the underlying world state, okay. And I'm not gonna focus in this talk or really tell you at all, how we do that. This is actually something
that appropriately trained neural networks can be good at learning this inverse mapping. You can also use other kinds of, approximate Bayesian computation,
more sort of top down, like, guess and check
Monte Carlo algorithms, if you're familiar with that. And again, Ilker actually,
has explored a number of those things in his
work, very interestingly. Here, we're just gonna
assume that somehow, given an image, you
get a reasonable guess, not perfect, but a reasonable guess of the positions of these
three dimensional blocks in the world and sizes and shapes. And then that, it's also the state of
this physics simulator, where now you can kind of
run your simulation forward, your fast and rough
and ready approximation to Newtonian mechanics, run it forward a few time
steps and see what happens. And if you take this rendering here, which is not exactly the positions
of the blocks shown here, but pretty close, and
you run that forward, well, this is what you get
after a few time steps. Think of that as one what's called posterior predictive sample
in this probabilistic, approximate simulation based
intuitive physics model. That's a lot of buzzwords, but hopefully you've seen enough of what
I've been talking about to get some sense of what those words are getting at, all right. Here's another sample. I'll just flash back
and forth between them. The difference is just the initial guess of where the blocks are. This is a less good guess. If you look at this image down
here, hopefully you'd agree, this isn't like a crazy guess compared to lots of other
positions of blocks, but it's a less good guess, all right? And you know, we assume
that our perceptual system is just giving us some approximation to the true three
dimensional scene structure. But whether you made
this guess or this guess, when you run it forward
in your physics simulator, basically the same thing happens. At the fine scale detail, very different things happen, okay. So, the final configuration
of the blocks here is quite different than the one here. But intuitive physics
doesn't care about that. What intuitive physics cares about is just most of the blocks fell, okay. So, that's the basis of this model. You run a couple of those simulations, a relatively small number, like maybe five, seven, three depend, I mean, we can try to
measure this quantitatively. I'm not gonna show you how, but like more than one, but not very many. And that yields a fit to behavioral data that looks like this here. Ignore the thing on the right for now, just look at the plot on the left. So, what that scatter plot
is showing is it's plotting, the vertical axis is plotting
the average human judgment on a scale of one to seven, where the high end means, we people, think it's very unstable. Those blocks are gonna all fall over. The low one means very
stable, not gonna move, okay. Each plus represents one stimulus or one block tower scene
such as one of the, these are three shown here, but in this experiment,
there were 60, okay. The plus are error bars, I think like 95% confidence intervals, both for human judgments
and for the model. So, the model is shown
on the horizontal axis and that from running a small
number of the simulations, like I showed you and
just computing the average or expected number of locks that fell. And what you can see is it
gives a pretty good fit, okay. This is, when we say
people are pretty good, that's what we mean here is that, they're pretty well
captured by this model. But there's another sense in
which they're not very good and that's what's shown over here. This is in a sense,
the more correct model. This is the same physics simulator, but it doesn't have any uncertainty as the first model does
in where the blocks are. So, this is like what you would get, if you could perfectly localize the position of every block, all right. That's why there's no error
bars in this side, right. The vertical numbers
and a bars are the same. That's the human judgements, but we've re plotted the
data with a different X axis showing the actual number of blocks that falls in each cases, when you run that simulation
with the ground truth physics and the ground truth
correct object position. And it's a less good model
of people in the sense that the correlation,
if you were to measure, this is about what we
call a 0.9 correlation, or explains about 80% of the variance. The model explains that in the human data. This is more of a 0.6 correlation, which is about 36% of the variance, right. So, it's much less good model. But it also shows that in some sense, people aren't that good. If you judge by the actual correct answer that you would want in your physics exam, it would be this one, okay. But what I would argue is that actually, this is the more useful one. If you want to build a robot, that's actually gonna do
this in the real world where there's always
gonna be some uncertainty in its perceptual system, you want it to be robust to uncertainty. And in a sense, what you're
seeing here is that this model, I would argue, is just
not robust to uncertainty. In a sense you could say, people have, again, I'm sure you're probably familiar with visual illusions and what they tell us about perception. This model shows that people suffer from what you could call
a stability illusion. People see stimuli like this red dot, which corresponds to this red tower here, which I'm sure most of
you think is unstable and should be falling over. But in fact, it's
actually perfectly stable. That's why the ground truth
model, it shows up as zero here. And there's many other things like this, many other stimuli to
some extent like this, where in the ground truth physics model, they're either perfectly stable or maybe just one block falls over, but people think they're
much more unstable so the judgements are much higher. What we show in this work right, is that that actually
is captured pretty well by this probabilistic physics simulation. It's just that the deterministic
one doesn't capture it. So, glass half full or
half empty, you can say, well, this just shows the limitations in our intuitive physics, the ways in which we're not that accurate. Or you could say, well, this shows the way that our system is designed in its inherent computational
architecture to be robust, to inevitable uncertainties, okay. So, this is just sort of a little again, little mini microcosm
of a much bigger pattern of experimental things that
we've studied in our lab and many others have
studied at this point. I'm not gonna go into much more detail. But you know, we can
ask many other questions like of the same model. Which way will the blocks
fall? How far will they fall? What happens if one color is
much heavier than the other? Notice how these two towers
here have the same geometry, but they get colored differently and you make different predictions. By seeing these towers
here that maybe look like they should be falling, but they aren't, You can make judgements
about how heavy or light some object might be. And that, we can do many
other studies like this. I'll show you, since I'm just checking the time and I don't wanna run too much time. I'll show you just one other
kind of application of this, which I think has interesting connections to the metaphysic side of
things that I think you studied, which has worked from Toby
Gerstenberg and colleagues. Toby was a postdoc in our
lab for a number of years. He's now an Assistant Professor
at Stanford in Psychology. And he's looked systematically at people make judgements about
causal responsibility, looking at various kinds of events, especially dynamic events like these. Literally looking at
how people look at this. What you can see here is that blue dot, you might recognize is an eye tracker. So it's a trace of where somebody looked as they were watching these movies. And a typical kind of Toby experiment is what's illustrated here,
where people are asked in some kinds of trials
to make a prediction. They see that this is like
a billiard ball scene. I hope you can see. So this is a dynamic
intuitive physics setting. And you might make a prediction of what's going to happen when A hits B. Will it go in the hole or not? Oh yeah. It just barely went in, okay. In other their trials like what's here, the question is not to make a prediction, but to make it more of
an explanation or to say, well, in this scene, did A cause
B to go in the hole, right? So, watch it and ask
yourself that question. So, did A cause B to go in the hole? All right. Well, what Toby found and the reason why I'm
showing this here is, you can see just in how
people look at these scenes, depending on the question
that they're asking, they look in a very different way. Look on the left here. When they're making a prediction, okay, they first look at C when
these things are gonna collide and then they just extrapolate forward where B is going, okay. But when they're doing a causal judgment, so they're basically trying
to judge in a graded way, how responsible was A for making B going, oh, look at where they're looking. They're not just looking
at where B is going to go, but where B would have gone
if A hadn't been there. Do you see, how they are sort of, trying to extrapolate B's motion? And that's because there's
a long tradition of work. Some of which is actually the core work that Laurie first became
known for in metaphysics, which is looking at how the
role of counterfactual analysis in causal relations basically. And I know this is something I think, you guys have talked about in the class, so I'm not gonna try to
retell in a limited way. But it's quite striking that you can see in how people look at a scene that they're doing this kind
of counterfactual analysis, if and only if, in very, very dramatic different proportions, they're making a causal judgment. Toby can also model this with his probabilistic physics simulation. And this is just a
sketch of how this works. It's very much like the block tower model that I showed. okay. Except that what the
system is trying to do is, it's trying to make, it has to make guesses
of the counterfactual. What would've happened if
A hadn't been in the scene? You don't actually know, right. You can see those eye movements. In a sense, those are the guesses realized right there on the screen in front of you. They're not perfect. They're sort of noisy extrapolations. And the model, though you
can't read it down here, is basically doing similar simulations with kind of noisy estimates of the ball's velocities
and positions, okay. So, the same idea of a noisy or probabilistic approximate
physics simulation can be used to capture how people predict what's gonna happen. And the counterfactual probabilities, they have to compute in order to make a causal responsibility judgment. And I won't go into the details of how you study this experimentally. But Toby has done really
wonderful experiments where he manipulates the degree of, how close or far the counterfactual is and how that separates from the actual one and gets very beautiful fits to data. It's an advertisement for basically, really elegant work that
shows the quantitative power of these models for capturing
a sense of causality and causal responsibility and not just predicting what's
gonna happen next, okay. One more advertisement. And I'll try to keep this even shorter, but this is recent work from Kelsey Allen and Kevin
Smith who I mentioned, where they're adding in the next step, which is, how do you use
probabilistic intuitive physics to solve problems, okay? In this case, thinking
about the human ability to make and use tools in
novel creative ways, right? Again, when we think about cognition, we think about creative problem solving as a as a core aspect of where our general purpose
intelligence comes in. Many people who studied the evolution of human intelligence, focus on the human ability
to not just make tools, but to just sort of find
and repurpose things like this or this rock,
or think of all the things I could use this phone for that aren't just making a call, right? Your phone's battery
is dead. Is it useless? Well, it depends, right. Maybe unstick this or,
you know, you have to, well, I won't go into it, okay. But in this paper, which
was published back in 2020, feels like just yesterday,
like a lot of things in 2020, in PNAS, they had people playing this really cool virtual tools game, where they basically have
to solve these problems. The problem is always get the
red ball into the green bowl. And there's many different
things that happen in these different levels of this game. It's kind of inspired
speaking of phones by these, like touch physics phone games,
which are popular pastime. And what they show is that basically, a probabilistic simulation based model provides a good account
of the internal process, we pose it, of people
sort of trying out ways they could solve the
problem by picking in a tool and thinking about where they
might place it into the scene. It can capture both where
people choose to drop objects as well as sort of the learning dynamics. And what this graph is just showing is, there's sort of this rapid trial and error kind of learning here. Sometimes you might see people talk about reinforcement learning
algorithms in AI and say, well, they learn like people
or animals by trial and error, but unlike what's called reinforcement learning typically in AI, people don't learn from, you know, thousands or millions
of training examples. They do do trial and error
learning as we all often do when we're trying to solve problems. But the real trial and error learning is something that unfolds
over like five or 10 trials. And that's what we see in
this virtual tools game as well as in the model, okay. So, I guess I don't have time to talk about intuitive psychology. But I do wanna advertise two things. One is, if you're interested
in how these kinds of ideas play out in understanding
intuitive psychology, you're again, very lucky here at Yale to have one of the world's experts in Julian Jara-Ettinger
and his research group. They do many things,
but among other things, they have developed models for, and sort of taken the kind of models that I've described here and shown all sorts of really
cool things to do with them. And I'm not gonna go into
the details of those. But I do want to show
you the following work. And again, I guess we could just call it mostly an advertisement. But since I bothered to
show you Heider and Simmel I wanna emphasize the work that is completely hidden on the bottom because of the zoom thing. I'm gonna just move that
for this purpose here. It's been a dream of many
cognitive scientists, and certainly of mine, to ha
to be able to really study, take that Heider and Simmel video and basically the build
models that can really see everything that people can do there. Now, we aren't there yet, okay. But maybe, we're like 99% of
the way. No, not even close. But maybe we're like 40% of the way. So, this is really exciting work by Aviv Netanyahu and Tianmin Shu, they're the co-first authors, which they published a version of it at a recent AI conference. But it's honestly, it's even more exciting as cognitive science than AI? It's this phase paper. And what they've done is they basically built this little domain
that they call flatland, where you can see these agents here that are basically interacting
with physical objects. They can like pick up things,
exert forces, throw them. And it allows us to capture in a controlled
quantitatively studiable way, many of the things that are going on. Not just when you see a single agent, like try to pursue a goal, but when you have multiple
agents interacting, like in the Heider and Simmel video. So in this setup, some agents
are strong some are weaker. Agents can have different goals. Like an agent could say, the red thing here could be
to get to the gold thing, or the agent could just have a goal to get to another entity. The agent here could have a goal to get the blue ball to the red space. These are possible goals
that the agents could have. They also have relationships
and social goals. So, agents could be helpful to each other as you can see here, where
it looks like the red agent is trying to help the green agent get the blue balls to somewhere. They could be adversarial like the way they seem to be fighting over
where this ball is to go. Or they could just be independent like the green one wants to get that ball up to the green space and the red one just wants to go down there. So, these are all possibilities. The cool thing is, these
weren't made by humans. These were made by our models. So, we have a model which is a probabilistic generative model of these multi-agent interactions. It sits on top of a physics engine, 'cause you can see that these agents are interacting with physics. And each agent has its own sort of goal and representation of
its social relationship and the other agent's goal. And then they do a fairly
complex planning process to generate the sequence of behavior. So, the probabilistic program goes from these underlying physics and social variables
and produces the movies. But then you can run it in reverse. It's not easy, but with the right kinds
of inference algorithms, you can then also see the
movie and work backwards to infer what those goals are, both the individual ones, the social ones. And then also run it again forward and predict what's gonna happen next. And so, you can see in this world with this generative model, it can produce quite interesting behavior. So for example, here,
you have an agent that, well, notice what happens here. You can just see what happens. The green one goes and steals the blue one and then the red one goes
and fights him for it, okay, and successfully gets it back, okay. So there, it's an adversarial interaction, but because the red one didn't
see the green one initially, he kind of left blue
one unprotected, okay. Notice this scene. The only difference is
that the red one now can see the green one. So, he doesn't go and
leave it unprotected. He stays there to protect it. That's the only difference is the partial perceptual observability that this system gets. Or you can try this. It's a little sort of touring test here. So on the top, you see two
adversarial interactions. Let me move this again. One is generated by the machine, another is actually generated by two people playing the game. Can you tell which one is the human and which is the machine? Let me give you another thing here. So, on the bottom, this is
now a helpful interaction where the green one is
trying to help the red one, oh sorry, the red, yeah. They're trying to collaborate to get the blue ball
into the yellow square. So, raise your hand if you think the ones on the left are the human. Raise your hand if you think the ones on the right are the humans. Okay. Well, most of you were right. I should have asked you
how confident you are. So, in our actual experiment when, well, yeah, now you know the answer. So, when we have people
judge these one at a time, people are almost about equally confident that they are sort of
natural human things. When you put them side by side, you can see some subtle cues, which you're clearly all seeing. And we I'm happy to
discuss that afterwards, what some of those cues might be. But hopefully, I think
what you'll agree is that these are fairly natural
kinds of interactions that really capture the ways agents might collaborate or compete, okay. Let me go back for the Zoom thing. Okay. So, there were a couple of things that I didn't get to talk about,
but I will just remind you that I didn't talk about these things, which is, how does this work in the brain? And how might these things be learned? I'll just say the one sentence answer is, there are particular parts
of the brain actually, which we can show with functional magnetic
resonance imaging, given the same kinds of stimuli that do respond to these sorts of things. And when it comes to learning, we can actually start to build models of how people learn these sorts of things. Laurie's gonna drag me off the stage. (all laughing) That's fine. And the one sentence thing
I wanna leave you with on learning is just the following idea, which is, if we wanna
build learning algorithms that can learn like a simulation program, the learning algorithms in a sense, have to be what we call program
learning programs, okay. So, what kind of algorithm
could take as input, experience and produces output another program, which is itself like a
probabilistic approximate simulator, let's say a physics, okay. This is an interesting challenge, okay. It's not the sort of thing that you have when you're trying to
learn a neural network. We sometimes call this to contrast with learning in a neural network where there's a smooth error surface, and you can use these
gradient descent algorithms, basically, just multi-variable calculus, to optimize all the parameters,
to train the system, to produce the right
input output behavior. If goal is to search through the space of all simulation programs, there's no nice topology or geometry, it's a much harder search problem. But somehow, you know, people
are able to solve this, and we wanna understand, how? What I will just say is
like, this is just sort of, call this the future of the study of, I think, human learning
and computational terms and more human like machine learning. And the advertisement here
is for a recent opinion piece in "Trends in Cognitive
Science" by Josh Rule, another former recent
student and recent graduate along with Steve Piantadosi, where we introduce this metaphor, perhaps an ill chosen name,
but "The Child is Hacker". You might have heard of
the "Child is Scientist". The idea that children's learning is like, a kind of scientific hypothesis testing and experimentation. By hacker here, we mean
like not the bad guys who break into your email and steal your credit card numbers, but the MIT notion of hacking, and I think maybe they have
some of this at Yale too. Like creative exploration, whether it's a world of
code or worlds of, you know, tunnels or whatever it is, but basically all the
ways in which we think we can basically think of
constructing knowledge. The way we construct like a body of code. So, it's not just taking
an existing program with a bunch of parameters
and tuning the parameters, but it's actually
algorithms that write code. And there's all sorts of
like, sort of bubbling ideas. bubbling little examples of this in the kind of cognitive AI
literature at this point. And I'll just say, check out this paper to see a review of some of those and how those might
apply to thinking about what I'd say is the
future of human learning, which is really algorithms
that write algorithms, okay. Whether it's a physics engine or all the other things
that we learn, okay. So that's it. This is the actual last slide. (audience applauding)
Okay. I hope you've gotten at least, a taste of how we can use these tools to capture some of the
aspects of common sense. But I also think the
hardest questions remain. So now, I'm happy to have
you throw them out of me. - [Brian] Very good. By the way, Josh, you said one sentence and I counted, it was 22 sentences. - Okay.
(audience laughing) That's just what I deserve. - We have a dangerously
small time for questions now because Josh has a
seminar to get too soon, but we will take at least
several minutes for questions. Since we have a lot of
students here in the audience, I think the first question for you, Josh has to be what kind of formative
educational experience, like what kind of college did
you go to that could have led to such a brilliant
ambitious research program? - Oh well, what a great question. I think it might have
been Ezra Stiles College. Or as we used to, yeah. It kind of looks like this, but different. Sort of. Yeah. So, yes. No. I was a Yale College undergrad. Before there was this great
cognitive science program, which is in large part
due to some of the efforts of Brian and number of colleagues
here at this point, so. - Fantastic. So let's open the floor up
for questions here. Very good. - Yeah.
- Thank you. - Oh, do you want. - Thank you for a lot for your talk. So, I'm new to this,
so this might be naive. So, I heard about probabilistic coding and I know about the neural network. So, can you build on like a framework, like what we are currently doing chip, when we are designing chip, we have a very large software where, where we can like use, for
example, probabilistic coding, to design the architecture and to convert to our neural
network implementation. So, do you think that
framework is possible to build? And if yes, like where should we start? - Yeah, so if I understand you correctly, I think what you're asking is
can you implement these ideas, which I described, you know, let's say at the software level, right? I talked about programs and algorithms. Can you implement them
at the hardware level? Could you imagine a chip, let's say, that like implements these computations in some physical device? Much as we have chips that do something similar
for neural networks, some of the are specially designed chips, others are ways that
people have figured out how to use the graphics GPUs
or graphics processing units, which are effectively chips for, that were designed originally
for computer graphics, which can be repurposed for training and running neural networks. And what I hear you asking
is, is there a similar way to design, like physical circuits that implement these computations? Is that what you're asking? - [Student] Yeah, basically. Because probabilistic coding is fantastic for human understanding, while neural network is currently what we are implementing in our brain. So I'm trying to figure out,
- I see. So how does this fit with
neural networks in the brain? - Basically.
- Yeah. Okay. I mean, the first question
was a good question. This is a related, but even
better question, I think. And it's one that, there's no way I can really do it justice here, but for those who are in the class, I hope we can discuss it some more. So again, what I think is
lurking behind what you're saying and I even reinforce this
in the beginning of my talk, is that, we have, like,
we have these things, which we call neural networks, which, that term actually
means two different things. There's the original neural network, which is the brain, right? If anything in the world
is a neural network, it's the brain 'cause it's, right, okay. But then there's these things called artificial neural networks, which are the tools that people in machine learning use, right? Now, they're related
to each other in that, the artificial neural networks, their basic units, their basic primitives, are in some way inspired
by what we understand about how actual neurons
in the brain work. But that inspiration is
loose and limited, right. Like in the very early
days of neural networks in machine systems that was
a tight link, relatively. I mean, we just didn't know very much. We didn't have very much
on computer technology and we knew less about how
neurons in the brain work. At this point, what's
called neural networks in machine learning,
artificial neural networks, goes quite beyond anything that we actually
understand about the brain or maybe is also quite
less like much less, I mean, basically there's almost
no relation at this point. There's a basic correlation, but it's not right, I think, to take today's artificial neural networks and say that's how real
biological neural networks are. Like, there is some relation, but we can't assume that
like the way to relate the software models of
probabilistic programming that I'm talking about here, the way to relate that to the brain is to go via an implementation in today's artificial neural networks. That's one possibility. And we and others have worked on that. Actually, I referred several
times to Ilker Yildirim's work. He has some really nice work in his group looking at that kind of thing, okay. Mario Belledonne, who's here somewhere. There's Mario. Mario's worked on that also with Ilker and a number of others, okay. So, that is one thing you can do. You can try to effectively compile these probabilistic programming things into an artificial neural net and say, maybe that's how the brain works. I'm not saying that's wrong. I'm just saying, we can't
assume that is right. Another thing you could try to do, and other people have
been working on this, is to try to say, well,
maybe there's some mapping between the probabilistic computations that go on under the hood to give you the numbers which I showed you here, and the probabilistic
spiking behavior of neurons that sort of, when you,
you could have networks of stochastic spiking neurons, which if you've studied
neuro circuit modeling, that's actually often the way
computational neuroscientists describe real biological neural networks. And it's possible that those
kinds of implementations could, in a more direct way, implement some of the
probabilistic inferences that we do to produce the pictures I showed you here. Basically Monte Carlo the Monte
Carlo principle that says, you approximate probabilities with sums, empirical expectations over
stochastic simulations. That's what's behind a
lot of what I showed you, and that can be implemented
directly in networks of stochastic spiking neurons. So, that's another route to
try to make this mapping. So, they're both like much to be done in the future to really see if any of those are going to work. But that's, I think, especially exciting to try to relate these models of the mind to models of the brain. And I think, that's also gonna be central to really having a fully satisfying answer to what kind of computation is cognition. - [Brian] That was a
very satisfying answer. Not all of the answers can be that long. - Okay. - [Brian] Let's take some more questions. And since that was a computational sort of oriented question, I believe one of the people who had their hands up still in the back, let's move to a philosophy
oriented question with a, we're not supposed to say our
names because of releases, but this person's name
may or may not be Michael. - Fortunately, we can do
probabilistic inference. - Thank you, Brian and thank you, Josh. A couple times in the talk, you mentioned you poo-poo the idea that
we're 99% of the way there, we're more like 40% of the way there. What would 100% of the way there be like? And would being 100% get
us success of any kind? So, I'm trying to envisage
what would success look like? - Right. So that, I did encourage
you to ask that question. So, thank you for asking
that difficult question. I think it depends. Well, wait, can you let him
keep the mic for one second? Do you mean success towards, so I used those numbers like 99% in both somebody's estimate
of how close are we to an AI goal, like self-driving cars, or also, how close are we towards some kind cognitive modeling goal? Which one were you asking? - [Student] Well, I was
thinking like the success, like you have something that
can behave just like humans do. And could they pass a
training test for example? And why would that be success if this training test can be passed? That's interesting, but what does it tell
us about the human mind? - Yeah. So, I don't know if I can
answer that in a short answer. And Brian has, - A haiku.
(audience laughing) Okay. This may be more
like 22 syllables than 12. But this is about scientific
model a building, right. And I don't think I'm gonna
be able to give you a fully, I don't know that I can give you a fully satisfying answer
to that question at all, let alone the one you want me
to answer right here, okay. But I think as in most other
areas of science, you know, how do we know when our
models are on the right track? Well, you could call it
some notion of coherence. And I think this is like basic
metaphysics and epistemology, which my philosophy colleague
can tell me more about. But when you have a model that
captures many different ways of getting at the same
phenomena, all right, that can include different
kinds of behavioral judgements that we can study in the lab, that can include acting like a person in the corresponding real world scenarios. That's the sort of AI tests of the model. And as we're starting to be
able to do, that can also, maybe even predict some
neural responses, okay. Which I didn't show you, that
these models are able to do, but they're starting to go
in that direction, okay. If this, if the same class
of models can explain or let's say can just fit all
those different sorts of data, and I would say, even
explain them in the sense that gives a functional explanation of why the behavior is the way it is, why the neurons are the way they are, because we would argue
this computational approach is actually solving
the real world problems in some efficient, effective way. I think that's a coherent kind of, at least epistemology here
and maybe even metaphysics, I think fundamentally, right, what is lurking behind this approach, and I can try to unpack this in less haiku like fashion later on is
that, I think at least, and I think most of us
as scientists think, the world is real, it
is really real thing. But the best we can do in our science is to build models of it. And that's true when we're doing science and it's true when we're
doing cognition also, right. This view of the mind that
I'm talking about here, and that is, I think, a canonical
one in cognitive science is this idea of the mind
as modeler a model builder. And I think the same kinds of tools of probabilistic modeling, prediction, including counterfactual
or hypothetical data, like the same things that
we think the mind does, we as scientists do, it's just the data, isn't the data coming
in through our retinas. It's the data that we get when people press buttons and say things, when we put things through
their retina, right? And I think that basic notion
of coherence and modeling is at the heart of why, you know, it's heart of how we, as humans intuitively understand the world, it's what makes satisfying understanding in pretty much every area
of science that I know. And I would love to explore more, trying to make that more
rigorous argument, I think. Let's just say, okay. - [Brian] Let's take one more question before Josh has shuttled up to the seminar that he'll talk to and someone who's name may or may not be Jack. - [Student] Hi, I have a question about sort of the architecture here, when you introduced the game engine part of cognition, part of learning, is the claim there that that's sort of, a separate self-sufficient module? And if so, how does it interface with symbolic or semantic representations? - Right. Okay. Because time is short, there's a lot to unpack there for people who maybe don't know everything
you were talking about. But I will just say, so these models, so, okay. So, there's I guess a standard
cognitive science view that I think maybe you're referring to, like a sort of photo modularity of mind, language of thought kind
of set of questions, right? So to translate this
picture into those terms, I would say there's not, the single module in the mind or brain
that's the game engine. . There's actually a set of brain systems that I can even tell you where they are based on some of my colleagues FMRI work, that are both, you know, somewhat functionally distinct subsystems. They also interact in ways. And like actually, a
good chunk of the brain can be mapped onto these different parts of the game engine in this model. Ilker wrote a nice review of this in "Current Opinion in
Neurobiology" a couple of years ago on at least the sort of
object physics part of that. So, I'll refer you to that
paper for one part of it. But I think with some other colleagues who've been studying like basically, social cognition networks, whether it's theory of mind or other ones. So, you can actually look
at the brain and see some, both modular structure, but
also interacting models. I would say they have some interesting informational interfaces. They're not as encapsulated
as Photer would've said, but there is some functionally
distinct sub-structure, okay. But you also asked about like traditional symbolic
and semantic knowledge. So, I think a really important
part of this research program is that these models are, what I think many cognitive scientists might call like pre-semantic. They're often more like perception. I mean Brian and I will argue about this probably in a few minutes. And I think, no, it's really interesting, and that in many ways, these models are more like
perception than like cognition. Even if Brian might say
sometimes, he might, I mean, I'll let you say. But like, they have some
components of the way Brian, who draws a strong line between
perception and cognition would call perception and others that are more like cognition. But they are certainly not like, you know, sort of traditional linguistic semantic verbally expressible knowledge. I think a lot of our early word meanings and some of our early syntax is actually, sits on top of this like
early syntax and semantics and syntax semantics mappings and like verb lexical semantics sits on top of this kind of basic stuff. And I mean, I'm just channeling things that like Ray Jackendoff or Tommy and many others would say, okay, or Lili Glitman, for example. But the key thing here is that symbols should not be restricted to language like semantic knowledge. It's essential that these models as we build them are symbolic, okay. Now, they also have to be neural. That's going back to the first question. But by far, at this point, by far our best models of these
kind of game engine things are symbolic representations
that have object structure, compositionality, all the usual nice like, photo pollution things, these
have them in spades, okay. They also support probabilistic inference and they can also support learning. So if one is used to thinking, you know, oh, well, symbolic languages that's somehow distinct from
like a statistical inference or probabilistic inference module. And then, somehow
learning comes into this. Like, if there's one note to maybe just take home from
this and end on it's that, I think to really understand cognition, we have to go beyond that way of, kind of parceling out responsibility in terms of different computational motifs and say, no, actually to
understand common sense, these all have to come to together. You have to be able to have fundamentally symbolic representations of
the structure of the world, but that support probabilistic inference and that also support learning. And I think, you know, we're
starting to see some steps. It's not 99% and it's not even 40%, but it's significant steps towards seeing how that synthesis
might play out, okay. - [Brian] With apologies
for the rush schedule. Let us thank Professor Josh Tenenbaum. (audience applauding) - Thanks. (students chattering)