MIT AGI: Building machines that see, learn, and think like people (Josh Tenenbaum)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
today we have Josh Tenenbaum he's a professor here at MIT leading the computational cognitive science group among many other topics and cognition and intelligence he is fascinated with the question of how human beings learn so much from so little and how these insights can lead to build AI systems that are much more efficient at learning from data so please give Josh a warm welcome all right thank you very much thanks for having me decided to be part of what looks like really quite a very impressive lineup especially starting after today and it's I think quite a great opportunity to get to see perspectives on artificial intelligence from many of the leaders in industry and other entities working on this this great quest so I'm going to talk to you about some of the work that we do in our group but also I'm gonna try to give a broader perspective reflective of a number of MIT faculty especially those who are affiliated with the Center for brains minds and machines so you can see up there on my affiliation academically I'm part of brain and cognitive science or course nine I'm also part of csail but I'm also part of the Center for brains minds and machines which is an NSF funded Center Science and Technology Center which really stands for the bridge between the science and the engineering of intelligence it literally straddles Vassar Street and that we have csail and DCs members we also have partners at Harvard and other academic institutions and again what we stand for I want to try to convey some of the specific things we're doing in the center and where we want to go with a vision that really is about jointly pursuing the science the basic science of how intelligence arises in the human mind and brain and also the engineering enterprise of how to build something increasingly like human intelligence in machines and we deeply believe that these two projects have something to do with each other and our best pursued jointly now it's really exciting time to be doing anything related to intelligence or certainly to AI for all the reasons that you know brought you all here I don't have to tell you this we have all these ways in which AI is kind of finally here we finally live in the era of something like real practical AI or for those who've been around for a while and have seen some of the rises and falls you know AI is back in a big way but from my perspective and I think maybe this reflects you know why we distinguish what we might call a GI from AI we we don't really have any real AI basically we have what I like to call AI technologies which are systems that do things we used to think that only humans could do and now we have machines that do them often quite well maybe even better than any human who's ever lived right like a machine that plays go but none of these systems I would say are truly intelligent none of them have anything like common sense none of them have anything like the flexible general-purpose intelligence that each of you might use to learn every one of these skills or tasks right each of these systems had to be built by large teams of engineers working together often for a number of years out often at great cost to somebody who's willing to pay for it and each of them just does one thing so alphago might beat the worlds best but it can't drive to the match or even tell you that go it what go is it can't even tell you the go is a game because it doesn't even know what a game is right so what's missing why what what is it that makes every one of your brains maybe you can't beat you know the world's best didn't go but any one of you can get behind the wheel of a car I think of this because my daughter is gonna turn 16 tomorrow if she lived in California she'd have a driver's license it's a little bit down the line for us here in Massachusetts but you know she didn't have to be specially engineered by billion dollar startups and you know she got really into chess recently and now she's taught herself chess by playing just you know a handful of games basically I mean she can do any one of these activities and any one of us can so what is it what's that what makes up the difference well there's many things right I'll talk about the the focus for us and our research and a lot of us again in CBMM is summarized here um what what drives the success is right now in AI especially in industry okay and all these AI technologies is many many things many things but what's what where the progress has been made most recently and what's getting most of the attention is of course deep learning but other kinds of machine learning technologies which essentially represent the maturation of a decades-long for to solve the problem of pattern recognition that means taking data and finding patterns in the data that tells you something you care about like how to label a class or how to predict some other signal okay and pattern recognition is great it's an important part of intelligence and it's reasonable to say the deep learning as a technology has really made great strides on pattern recognition and maybe even you know has coming close to solving the problems of pattern recognition but intelligence is about many other things intelligence is about a lot more in particular it's about modeling the world and think about all the activities that a human does so model the world that that go beyond just say recognizing patterns and data but actually trying to explain and understand what we see for instance okay or to be able to imagine things that we've never seen that never seen maybe even very different from anything we've ever seen but might want to see and then to meet to set those as goals to make plans and solve problems needed to make those things real or thinking about learning again the you know some kinds of learning can be thought of as pattern recognition if you're learning sufficient statistics or weights in a neural net that are used for those purposes but many activities of learning are about building out new models right either refining reusing improving old models or actually building fundamentally new models as you've experienced more of the world and then think about sharing our models communicating our models to others modeling their models learning from them all these activities of modeling these are at the heart of human intelligence and it requires a much broader set of tools so I want to talk about the ways we're studying these activities of modeling the world and something in a pretty non-technical way about what are the kind of tools that allow us to capture these abilities now I think it's I want to be very honest up front and to say this is just the beginning of a story right when you look at deep learning successes that itself is a story that goes back decades I'll say a little bit about that history in a minute but where we are now is just looking forward to a future when we might be able to capture these abilities you know at a really mature engineering scale and I would say we are far from being able to capture the all the ways in which humans richly flexibly quickly build models of the world at the kind of scale that say Silicon Valley wants either big tech companies like Google or soft or IBM or Facebook or small startups right we can get there and I think what what I want to talk to you about here is one route for trying to get there and this is the route that CBMM stands for the idea that by reverse engineering how intelligence works in the human mind and brain that will give us a route to engineering these abilities in machines when we say reverse engineering we're talking about science but doing science like engineers this is our fundamental principle that if we approach cognitive science and neuroscience like an engineer where so the output of our science isn't just a description of the brain or the mind in words but in the same terms that an engineer would use to build an intelligence system then that will be both the basis for a much more rigorous and deeply insightful science but also direct translation of those insights into engineering applications now I said before I talk a little about history what I mean by that is is this again if if part of what brought you here is deep learning and I know even if you've never heard of deep learning before which I'm sure is unlikely you saw some you know a good spectrum of that in the in the overview session last night okay it's really interesting and important to look back on the history of where did techniques for deep learning come from or reinforcement learning those are the two tools in the in the current machine learning arsenal that are getting the most attention things like back propagation or end to end stochastic gradient descent or temporal difference learning or cue learning here's a few papers from the literature you know maybe some of you have read these original papers here's here's the original paper by rumelhart Hinton and colleagues in which they introduced the back propagation algorithm for training multi-layer perceptrons right multi-layer neural networks here's the original perceptron paper by Rosenblatt which introduced the one layer version of that architecture and the basic perceptron learning algorithm here's the first paper on sort of the temporal difference learning method for reinforcement learning from Sutton and Bartow here's the original Bolton machine paper also by Hinton and colleagues which you know again is a those you don't know that architecture they give a kind of probabilistic undirected multi-layer perceptron or for example before there were LS TMS if you know about current recurrent neural network architecture earlier as much simpler versions of the same idea were proposed by Jeff Elman and his simple recurrent networks the reason I want to put up the original papers here for you to look at both when they were published and where they were published so if you look at the dates you'll see papers going back to you know the the 80s but even the 60s or even the 1950s and look at where they were published most of them were published in psychology journals so the journal psychological review if you don't know it is like the leading journal of theoretical psychology and mathematical psychology okay or cognitive science the Journal of the cognitive science Society or the the backdrop paper was published in Nature which is a general interest science journal but by people who are mostly affiliated with an Institute for cognitive science in San Diego so what you see here is already a long history of scientists thinking like engineers these are people who are in psychology or cognitive science departments and publishing in those places but by formalizing even very basic insights about how humans might learn or how you know brains might learn in the right kind of math that led to of course progress on the science side but it led to all the engineering that we see now it wasn't sufficient right we needed we needed of course lots of innovations and advances in computing hardware and software systems right but this is where the basic the basic math came from and it came from doing science like an engineer so what I want to talk about in our vision is what is the future of this look like if we were to look 50 years into the future what would we be looking back on now or you know over this time scale well here's that here's a long-term research roadmap that reflects some of my ambitions and some of our centers goals and many others too right we'd like to be able to address basic questions fundamental questions of what it is to be and to think like a human questions for example of consciousness or meaning in language or real learning right questions like you know even beyond the individual like questions of culture or creativity so our big ideas up there and for each of these there are basic scientific questions right how do we become aware of the world in ourselves in it starts with perception but it really turns into awareness awareness of yourself and of the world and what we might call consciousness right or how does a word start to have a meaning what really is a meaning and how does a child grasp it or how did children actually learn what do babies brains actually start with are they blank slates or do they start with some kind of cognitive structure and then what is real learning look like these are just some of the questions that were we're interested in working on or when we talked about culture we mean how do you learn all the things you didn't directly experience right but that somehow you got from the accumulation of knowledge in society over many generations or how do you ever think of new ideas or answers to new questions how do you think of the new questions themselves how do you decide what to think about these are all key activities of human intelligence when we talk about how we model the world where our models come from what we do with our models this is what we're talking about and if we could get machines that could do these things well again on the bottom row think of all the actual real engineering payoffs now in our Center in both my own activities and a lot of what my group does these days and what a number of other colleagues in the Center for brains minds and machines do as well as you know brought very broadly people in VCS and csail one place where we work on the beginnings of these problems in the near term this is the long term like think 50 years okay maybe short or maybe longer I don't know but think well beyond well beyond 10 years but in the short term 5 to 10 years a lot of our focus is around visual intelligence and there's many reasons for that again we can build on the successes of deep networks and a lot of pattern recognition and machine vision it's a good way to put these ideas into practice when we when we look at the actual brain the visual system in the brain in the human and other mammalian brains for example is really very clearly the best understood part of the brain and at a circuit level it's the part of the brain that's most inspired current deep learning and neural network systems but even there there's things which we still don't really understand like engineers so here's an example of a basic problem in visual intelligence that we and others in the centre are trying to solve look around you and you feel like there's a whole world around you and there is a whole world around you feel like your brain captures it but what what the actual sense data that's coming in through your eyes looks more like this photograph here where you can see there's a crowd scene but it's mostly blurry except for a small region of high resolution in the center so that corresponds biologically to what part of the images in your fovea that's the central region of cells in the retina where you have really high-resolution visual data the size of your phobia is roughly like if you hold out your thumb at arm's length it's a little bit bigger than that but not much bigger right most of the image in terms of the actual information coming in and a bottom-up sense to your brain is really quite blurry but somehow by looking at just one part and then by secada around or making a few eye movements you get a few glimpses each not much bigger than the size of your thumb at arm's length somehow you stitch that information together into what feels like and really is a rich representation of the whole world around you and when I say around you I mean literally around you so here's another kind of demonstration um without turning around nobody's allowed to turn around ask yourself what's behind you now the answer is going to be different for different people depending on where you're sitting right for most of you you might think well there's I think there's a person pretty close behind me all right you know you're in a crowded auditorium although you haven't seen that person you know that they're there right for people in the very back row you know there isn't a person behind you and you're conscious of being in the back row right you might be conscious that there's a wall right behind you but now for the people who are in the room not in the very back think about how far behind you is the back like where's the nearest wall behind you so we can get maybe we can call out try a little demonstration so I don't know I'm pointing to someone there can you see phrase say something if you think I'm pointing at you well I could have been pointing at you but I'm pointing someone behind you okay I'll point to you yeah I'm pointing to you all right so how far is the nearest wall no you can't turn around you've blown your chance right without turning around okay so you you were laughs okay do you see I'm pointing to you there with the tie okay so without turning around how far is the nearest wall behind you that's sorry how far five meters okay well I mean that might be about right no other people can turn around how about you how far is the nearest wall behind you ten meters okay that might be right yeah how about here how what do you think twenty okay see yeah since I didn't grow up in the metric system I barely know but yeah I mean I mean the point is that like you're you're you each of you is is not surely not exactly right but you're certainly within an order of magnitude and I guess if we actually tried to measure you know you're probably my guess is you're probably right within you know fifty percent or less often you know maybe just twenty percent error okay so how do you know this I mean even if it's not what did you say twenty meters even if it's not twenty meters it's probably closer to 20 meters than it is to 5 or 10 meters and then it is 250 meters so how do you know this you haven't turned around in a while right but some part of your brain is tracking the whole world around you right and how many people are behind you yeah like a few hundred right I mean I don't know if it's 200 or 300 or but it's not a thousand I mean I don't think so and it's certainly not ten or 20 or 50 right so you track these things and you use them to plan your actions okay so again think about how instantly effortlessly and very reliably okay your brain computes all these things so the people and objects around you and it's not just you know approximations certainly when we're talking about what's what's behind you in space there's a lot of imprecision but when it comes to reaching for things right in front of you very precise shape and physical property estimates needed to pick up and manipulate objects and then when it comes to people it's not just the existence of the people but something about what's in their head right you track whether someone's paying attention to you and you're talking to them what they might want from you what they might be thinking about you what they might be thinking about other people okay so when we talk about visual intelligence this is the whole stuff we're talking about and you can start to see how it turns into basic questions I think of not of what we might call the beginnings of consciousness at least our awareness of ourself in the world and of ourselves as a self in the world but also other aspects of higher-level intelligence and cognition that are not just about perception like symbols right to describe even to ourselves what's around us and where we are and what we can do with it you have to go beyond just what we would normally call the stuff of perception to say the thoughts in somebody's head and your own thoughts about that okay so what we've been doing in CBMM is trying to develop an architecture for visual intelligence and I'm not going to go into any of the details of how this works and this is just notional this is just a picture it's like a just a sketch from a grant proposal of what we say we want to do but it's based on a lot of scientific understanding of how the brain works there are different parts of the brain that correspond to these different modules in our architecture as well as some kind of emerging engineering way to try to capture at the software and maybe even hardware levels how these modules might work so we talk about a sort of an early module of a visual or perceptual stream which like bottom-up visual or other perceptual input that's the kind of thing that is pretty close to what we currently have and say deep convolutional neural networks but then we talk about some kind of the output of that isn't just pattern class labels but what we call the cognitive core core cognition so we get an understanding of space and objects there physics other people their minds that's the real stuff of cognition that has to be the output of perception but somehow we have to we have we have to have this is what we call the brain OS in this picture we have to get there by stitching together the bottom-up inputs from glimpse here a glimpse here a little bit here and there and accessing prior knowledge that comes from our memory systems to tell us how to stitch these things together into the really core cognitive representations of what's out there in the world and then if we're going to start to talk about it in language or to build plans on top of what we have seen and understood that's where we talk about symbols coming into the picture ok the building blocks of language and plans and so on so now we might say well ok this is an architecture that is brain inspired and cognitively inspired and and we're planning to turn into real engineering and you can say well do we need that maybe you know again I know this is a question you considered in the first lecture maybe the engineering toolkit that's currently been making a lot of progress in let's say industry maybe that's good enough maybe you know let's take deep learning but to stand for a broader set of modern pattern recognition based and reinforcement learning based tools and say ok well maybe that can scale up to this and you might you know it but maybe that's that's possible I'm happy in the question period of people want to debate this my sense is no I think that it's not when I say no I don't mean like it can't happen or it won't happen what I mean is the highest value the highest expected route right now is to take this more science-based reverse engineering approach and that if at least if you follow the current trajectory that industry incentives especially optimized for it's not even really trying to take us to these things so think about for example a case study of visual intelligence that is in some ways as pattern recognition very much of a success it's again been mostly driven by industry it's something that if you read in the Jews or even play around with in certain of it publicly available datasets feels like we've made great progress and this is an aspect of visual intelligence which is sometimes called image captioning it's bate or mapping images to text you know basically there's been a bunch of systems here's a couple of press releases I guess this one's about Google Google's AI can now capture images almost as well as humans here's ones about Microsoft a couple of years ago I think there were something like eight papers all released onto archive around the same time from basically all the major industry computer vision groups as well as a couple of academic partners okay which all driven by basically the same data set produced by some Microsoft researchers and other collaborators trained a combination of deep convolutional neural networks you know state of the art visual pattern recognition with recurrent neural networks which had recently been developed for you know basically kinds of neural statistical language modeling glued them together and produced a system which which which made very impressive results in a big training set and a held-out test set where the goal was to take an image and write a sentence like a short sentence caption that that would seem like the kind of way a human would describe that image and these systems you know surpassed human level accuracy on the held-out test set from a big training set but what you can see when you really dig into these things is there's often a lot of what I would call data set overfitting it's not overfitting to the training set but it's overfitting to whatever are the particular characteristics of this data set you know wherever ever came from certain set of photographs and certain ways of captioning them okay which even a big data set it's not about quantity it's more about the quality the nature of what people are doing all right so one way to test this system is to apply it to what seems like basically the same problem but not within the a certain curated or built data set and there's a convenient Twitter bot that lets you do this so there's something called the pic desk bot which takes one of the state of the art industry AI captioning systems a very good one again this is not meant to I'm not trying to critique these systems for what they're trying to do I'm just trying to point out what they don't really even try to do so this takes the microsoft caption bot and just every couple of hours takes a random image from the web captions it and upload the results to Twitter and a couple of months ago when I prepared a first version of this talk I just took a few days in the life of this Twitter bot I didn't take every single image but I took you know most of the images in a way that was meant to be representative of the successes and the kinds of failures that such a system will make so we can go through this and it's a little bit entertaining and I think quite informative so here's just a somewhat random sample of a few days in the life of one of these caption BOTS so here we have a picture of a person holding for tonight my screen is very small here and I can't read up there so maybe you'll have to tell me was that but a person holding a cell phone I guess I'll just read along with you so have a person holding a cell phone well it's not a person holding a cell phone but it's kind of close it's a person holding some kind of machine so I don't even know what that is but it's some kind of musical instrument right so that's a mixed success or failure here's some pretty good one a group of people on a on a field playing football that's I would call that a you know a result maybe even A+ here's a group of people standing on top of a mountain so less good there's a mountain but as far as I can tell there's no people but these systems like to see people because of both the combination because in the data set they were trained on there's a lot of people and people often talk about people okay I mean and the fact that you can appreciate both what I said and why it's funny that's there you did some of my cognitive activities that this system is not even trying to do okay here we've got a building with the cake I'll go through these fast building with the cake a large stone building with the clock tower I think that's pretty good I'd give that like a b-plus there's no clock but it's plausibly right there might be a clock in there there's definitely something like that here's a truck parked on the side of a building I don't know maybe a b-minus there there is a car on the side of a building but it's not a truck and it's and it's it's not doesn't seem like the main thing in the image okay here's a necklace made of bananas here's a large ship in the water this is pretty good I give this like an a-minus or b-plus because there is a ship in the water but it's not very large it's really more of like a tugboat or something here's a sign sitting on the grass you know in some sense that's great no but it but in another sense it's really missing what's actually interesting and important and meaningful to humans here's a here's a garden is in the dirt a pizza sitting on top of the building a small house with the red brick building that's pretty good although a kind of weird way of saying it a vintage photo of a pond that's good they like vintage photos a group of people that are standing in the grass near a bridge again there's two people and there's some grass and there's a bridge but it's really not what's going on a person in the yard okay kind of a group of people standing on top of the boat there's a boat there's a group of people they're standing but again it's what the sentence that you see is is more based on a bias of what people have said in the past about images that are only vaguely like this a clock tower is a little at night that's really I think pretty impressive a large clock mounted to the side of the building a little bit less so a snow-covered feel very good a building with snow on the ground a little bit less good there's no snow white some people who I don't know them but I bet that's probably right because face identifying faces and recognizing people who are famous because they won you know medals and the Olympics probably I would trust current pattern recognition systems to get that a painting of a base in front of a mirror less good also a famous person there but we didn't get him a person walking in the rain again there is sort of a person and there's some puddles but not you know a group of stuffed animals a car parked in a parking lot that's good a car parked in front of a building less good a plate with a fork and knife a clear blue sky okay so you get the idea again like if you actually go and play with the system partly because I think Mike but my friends at Microsoft told me they've improved at some you know I this is partly for entertainment values you know I chose what also would be the funnier example so I'm quite I want to be quite honest about it and these are I'm not trying to take away what our impressive AI technologies but I think it's clear that there's a sense of understanding any one of these images that it's important to see that even when it seems to be correct right if it can make the kind of errors that it makes that even when it seems to be correct it's probably not doing what you're doing and it's probably not even trying to scale towards the dimensions of intelligence that we think about when we're talking about human intelligence okay another way to put this I'm going to show you a really insightful blog post from one of your other speakers so in a couple of days I'm not sure you're going to have Andre Karpov a who's one of the leading people in deep learning this is a really great blog post he wrote a couple of years ago when he was I think still at Stanford he got his PhD from Stanford he did he worked at Google a little bit on some early big neural net AI projects there he was an open AI he was one of the founders of open AI and recently he joined Tesla as their director of AI research but about five years ago he was looking at the state of computer vision from a human intelligence point of view and and lamenting how far away we were okay so this is the title of his blog post the state of computer vision nai-nai we are really really far away and he took this image which was a sort of a famous image in its own right it was a popular image of Obama back when he was president kind of playing around as he liked to do when he was on tour so if you take a look at this you can see you probably all can recognize the previous President of the United States but you can also get the sense of where he is and what's going on and you might see people smiling and you might get the sense that he's playing a joke on someone can you see that right so how do you know that he's playing a joke and what that joke is well as Andre goes on to talk about in his blog post too if you think about all the things that that you have to really deploy in your mind to understand that it's a huge list of course it starts with seeing people and objects and maybe doing some face recognition but you have to do things like for example notice his foot on the scale and understand enough about how scales work that when a foot presses down it exerts force that the scale is sensitive doesn't just magically measure people's weight but it does that somehow through force you have to see who can see that he's doing that and who can't who cannot see that he's doing that right in particularly the person on the scale and why some people can see that he's doing that and can see that some other people can't see it why that makes it funny to them okay and someday we should have machines that can understand this but hopefully you can see why what I would I what the kind of architecture that I'm talking about would be the building blocks of the ingredients to be able to get them to do that now I when I again I prepared a version of this talk a few months ago and I wrote to Andre and I said I was gonna use this and I was curious if he how what you know if he had any reflections on this and where he thought we were relative to five years ago because a certain a lot of progress has been made but he said here's his email I hope he doesn't mind me sharing it but I mean again he's a very honest person and that's one of the many reasons why he's such an important person right now in AI okay he's both very technically strong and honest about what we can do what we can't do and as he says well what does he say it's nice to hear from you it's funny you should bring this up I was also thinking about writing a a return to this and in short basically I don't believe we've made very much progress right he points out that in his long list of things that you'd need to understand the image we have made progress on some the ability to again detect people and do face recognition for well-known individuals okay but that's kind of about it all right and he wasn't particularly optimistic that the current route that's being pursued an industry is is anywhere close to solving or even really trying to solve these larger questions um if we give this image to that caption bot you know what we see is again represents the same point so here's the caption bot it says I think it's a group of people standing next to a man in a suit and tie right so that's right right as far as it goes it just doesn't go far enough and the current the current ideas of built a data set train a deep learning algorithm on it and then repeat um aren't really even I would venture trying to get to what we're talking about or here's another I'll just give you one other example of a couple of photographs from my recent vacation and a nice warm tropical look how which I think illustrates ways in which again the gap where we have machines that can say beat the world's best at go but can't even beat a child at tick-tack-toe now what do I mean by that well you know of course we can build we don't even need reinforcement learning or deep learning to build a machine that can they can win or tie do is do optimally in tic-tac-toe but think about this this is a real tic-tac-toe game which I saw on the grass outside my hotel right what do you have to do to look at this and recognize that it's a tic-tac-toe game you have to see the objects you have to see what's you know in some sense there's a three by three grid but it's but it's only abstract right it's only delimited by this these ropes or strings okay it's not actually a grid in any simple geometric sense all right but yet a child can look at that and indeed here's an actual child who was looking at it and recognized oh it's a game of tic-tac-toe and even know what they need to do to win we put the X and completed and now they've got three in a row right that's that's literally child's play okay you showed this sort of thing though to one of these you know image understanding caption BOTS and I think it's a close-up of a sign okay again it's not like saying that this is a close-up of a sign is is not the same thing I would venture as a as a cognitive or computational activity that's going to give us what we need to say recognize the objects to recognize it as a game to understand the goal and how to plan to achieve those goals whereas this kind of architecture is designed to try to do all of these things ultimately right and I bring in these examples of games or jokes to really show where perception goes to cognition you know that and all the way up to symbols right so to get objects and forces and mental states that's the cognitive core but to be able to get goals and plans and what do I do or how do I talk about it that's symbols okay here's another way into this and it's one that also motivates I think a lot of really good work on the engineering side and a lot of our interest in the science side is think about robotics and think about what do you have to do to you know what is the brain have to be light to control the body so again you're gonna hear from shortly I think maybe it's next week from Mark raybert who's one of the founders of Boston Dynamics which is one of my favorite companies anywhere they're without doubt the leading maker of humanoid robots legged locomoting robots in industry they have all sorts of other really cool robots robots like dogs robots that have all you know I think you'll even get to see a live demonstration of my new robots this really awesome impressive stuff okay um but what about the minds and brains of these robots well again if you ask mark ask them how much of human-like cognition do they have in their robots and I think he would say very little in fact we have asked him that and he would say very little he has said very little he's actually one of the advisors of our Center and I think in many ways were very much on the same page we both want to know how do you build the kind of intelligence that can control these bodies like the way a human does alright um here's another example of an industry robotics effort this is Google's arm farm where you know they've they've got lots of robot arms and they're trying to train them to pick up objects using various kinds of deep learning and reinforcement learning techniques and I think it's one approach I just think it's very very different from the way humans learn to say control their body and manipulate objects and you can see that in terms of things that go back to what you were saying when you're introducing me right think about how quickly we learn things right here you have these the arm farm is trying to generate you know effectively maybe if not infinite but hundreds of thousands millions of examples of reaches and pickups of objects even with just a single gripper and yet a child who in some ways can't control their body nearly as well as robots can be controlled at the low level and is able to do so much more so I'll show you two of my favorite videos from YouTube here which motivate some of the research that we're doing the one on the left is a one and a half year old and the other ones a one year old so just watch this one and a half year old here doing a popular activity for many kids as a playing hmm you see video up there I'd okay there we go okay so he's he's on doing this stacking Cup activity alright he's stacking up cups to make a tall tower he's got a stack of three and what you can see for the first part of this video is it looks like he's trying to make a second stack and that he's trying to pick up at once basically he's trying to make a stack of two that'll go on the stack of three and you know he's trying to debug his plan because it's it got a little bit stuck here but and think about I mean again if you know anything about robots manipulating objects even just what he just did no robot can decide to do that and actually do it right at some point he's almost got it it's a little bit tricky but at some point he's gonna get that stack of two he realizes he has to move that object out of the way look at what he just did move it out of the way use two hands to pick it up and now he's got a stack of two on a stack of three and suddenly you know subgoal completed he's now got a stack of five and he gives himself a hand because he know he knows he accomplished a keyway point along the way to his final goal that's a kind of early symbolic cognition right to understand that I'm trying to build a tall tower but a tower is made up of little towers it's you know it can end and you can take a tower and put it on top of another tower or stack a stack on us a can you have a bigger stack right so think about how he goes from bottom up perception to the objects of the physics needed to manipulate the objects to the ability to make even those early kinds of symbolic plans at some point he keeps doing this he puts another stack on there I'll just jump to the end oops sorry you missed it so he he gets really excited and he gives himself another big hand but falls over okay again Boston Dynamics now has robots that could pick themselves up after that that's really impressive again but all the other stuff to get to that point we don't really know how to do in a robotic setting or think about this baby here this is a younger baby this is one of the Internet's very most popular videos because it features a baby and a cat and but the babies doing something interesting he's got the same cups but he's decided he's again decided to try a new thing so this think about creativity he's decided that his goal is to stack up cups on the back of a cat I guess he's asking how many cups can I fit on the back of a cat well three let's see can I fit more let's try another one okay well he can't fit more than three it turns out and then he then does it's not working so he changes his goal now his goal appears to be to get the cups on the other side of the cat now watch that part when he reaches back behind him there that's I'll just pause it there for a moment umm someone he just reached back there that's a particularly striking moment in the video it shows a very strong form of what we call in cognitive science object permanence okay that's the idea that you represent objects as these permanent enduring entities in the world even when you can't see them in this case he hadn't seen or touched that object behind him for like at least a minute right maybe much longer I don't know and yet he still knew it was there and he was able to incorporate it in his plan right there's a moment before that when he's about to reach for it but then he sees this other one right and it's only when he's now exhausted all the other objects here that he can see he's like okay now time to get this object and bring it into play right so think about what has to be going on in his brain for him to be able to do that right that's like the analog of you understanding what's behind you okay um it's not that these things are impossible to capture machines far from it it's just that like training a deep neural network or any kind of pattern recognition system we don't think is going to do it but we think by reverse engineering how it works in the brain we might be able to do it I think we can can do it okay it's not just humans that do this kind of activity here's a couple of again rather famous videos you can watch all of these on YouTube crows are famous object manipulators and tool users but also orangutangs other primates rodents we can watch if we just hey let me pause this one for a second if we watch this orangutan here he's got a bunch of big legos and over the course of this video he's building up a stack legos it's really quite impressive you're just jumping to the end there's actually some controversy out there of whether this video is a fake but the controversy isn't about you know it's not like whether it was I don't know dumb with computer animation some people think the video was actually filmed backwards that a human built up the stack and the orangutan just slowly disassembled it piece by piece and it turns out it's remarkably hard to tell whether it's played forward or backwards in time and people have argued over little details because you know it would be quite impressive if an orangutan actually was able to build up this really impressive stack of Legos but I would submit that it would be almost as impressive if he disassembled it think about the activity I mean if I wanted to disassemble that the easiest thing to do would just be to knock it over that's really all most robots could do but to piece by piece disassemble it even if it's played backwards like this that's still a really impressive act of symbolic planning on physical objects or here you've got this this famous Mouse this you can find on the internet under the mouse versus cracker video and what you'll see here over the course of this video is a mouse valiantly and mostly hopelessly struggling with a cracker that they're hoping to bring back to their nest I guess it's a very appealing big meal and at some point after just trying to get it over the over the wall at some point the mouse just gives up because it's just never gonna happen and he just goes away except that because even Mouse's can dream or mice can dream some point he decides okay I'm just gonna come out for one more try and he tries one more time and this time valiantly gets it over yeah isn't that very impressive congratulations guys okay you don't have to clap form you can clap for me at the end or clap for whoever later okay but I want to applaud the mouse there every time I see that okay but again think what had to be going on in his brain able to do that all right it's a crazy thing and yet he formulated the goal and was able to achieve it I'll just show one more video that is really more about science these other ones are you know some of them actually were from scientific experiments but this is one that motivates a lot of the science that I do and it's to me it sets up kind of a grand cognitive science challenge for AI and robotics it's from an experiment with humans again eighteen month olds or one-and-a-half year old so the the kids in this experiment were the same age is the first baby I showed you the one who did the stacking and 18 months is really a very very good age to study if you're interested in intelligence for reasons we can talk about later if you're interested this is from a very famous experiment done by two psychologists Felix Warren akin and Michael Tomasello and it was studying the spontaneous helping behavior of young children it also contrasted humans and chimps and the punchline is that chips sometimes do things that are kind of like what this human did but not nearly as reliably or as flexibly okay so not nearly it is and I'll show you a particular kind of unusual situation where human kids had relatively little trouble figuring out kind of what to do or even whether they should do it whereas basically no chimp did what you're gonna see humans sometimes doing here so the experimenter in this movie I'll turn on the sound here if you can hear it the experimenter is the tall guy and the participant is the little kid in the corner there there there's sound but no words right and at some point he stops and then the kid just does whatever they want to do so watch what he does he goes over he opens the cabinet looks inside then he steps back and he looks up at felix and then looks down okay and then the action is completed now well wonder I want you to watch it one more time and think about what's gotta be going inside the kid's head to understand this to understand like so it seems like what it looks like to us is the kid figured out that this guy needed help and helped him and the paper is full of many other situations like this this is just one OK but the key idea is that the situation is somewhat novel people have seen people holding books and opening cabinets but probably it's very rare to see this kind of situation exactly right it's different in some important details from what you might have seen before and there's other ones in there that are really truly novel because they just made up a machine right there okay but somehow he has to understand causally from the way the guy's banging the books against the thing that it's it's sort it's sort of both a symbol but it's also somehow he's got to understand what he can do and what he can't do and then what the kid can do to help and I'll show this again but really just watch the main part I want you to see is I'll just sort of skip ahead so watch this part here let's say I'll just jump right when he watch right now he's about to look up he looks up and makes eye contact and then his eyes look down so again he looks up he looks up and then a saccade a sudden rapid eye movement down down to his hands up down okay so that's again that's this brain OS in action right he's making one glance small glance at the big guy's eyes just to make eye contact to see to get a signal did I understand what you wanted and did you did you register that joint attention and then he makes a prediction about what the guy's gonna do so he looks right down he doesn't just like look around randomly he looks right down to the guy's hands to track the action that he expects to see happening if I did the right thing to help you then I expect you're gonna put the books there okay so you can see these things happening and we want to know what's going on inside the mind that guides all of that all right so that's the sort of big scientific agenda that we're working on over the next few years where we think some kind of human understanding of human intelligence in scientific terms could lead to all sorts of AI payoffs in particular suppose we could build a robot that could do what this kid and many other kids and these experiments do just say help you out around the house without having to be programmed or even really instructed just to kind of get a sense oh yeah you need to have at that shirt let me help you out okay even 18 month olds will do that sometimes not very reliably or effectively sometimes they'll try to help and really do the opposite right but imagine if you could take the the flexible understanding of humans actions goals and so on and make those reliable engineering technology that would be very useful and it would also be related to say machines that you can actually start to talk to and trust in some ways right that shared understanding so how are we gonna do this well let me spend the rest of the time talking about how we try to do this right some of the some of the technology that we're building both in our group and more broadly to try to make these kinds of architectures real and I'll talk about two or three technical ideas again not in any detail all right um what is the idea of a probabilistic program so this is a kind of a you think of it as a computational abstraction that we can use to capture the common-sense knowledge of this core cognition so when I say we have an intuitive understanding of physical objects in people's goals how do I build a model of that model you have in the head probabilistic programs a little bit more technically our one way to understand them is as a generalization of Bayesian networks or other kinds of directed graphical models if you know those okay but where instead of defining a probability model on a graph you define it on a program and thereby have access to a much more expressive toolkit of knowledge representation so data structures other kinds of algorithmic tools for representing knowledge okay but you still have access to the ability to do probabilistic inference like in a graphical model but also causal inference in a directed graphical model so for those of you who know about graphical models that might make some sense to you but just more broadly what this is think of this as as a toolkit that allows us to combine several of the best ideas not just of the recent deep learning era but over if you look back over the whole scope of AI and as well as cognitive science I think there's three or four ideas there and more but definitely like three ideas we could really put up there that have proven their worth and have have had have risen and fallen in terms of each of these had ideas when the mainstream of the field thought this was totally the way to go and every other idea was was obviously a waste of time and also had its time when many people thought it was a waste of time okay and these three big ideas I would say are first of all the idea of symbolic representation or symbolic languages for knowledge representation probabilistic inference in generative models to capture uncertainty ambiguity learning from sparse data and in their hierarchical setting learning to learn right and then of course the recent developments with neural inspired architectures for pattern recognition okay each of these things each of these ideas symbolic languages probabilistic inference and neural networks has some distinctive strengths that are real weak points of the other approaches right so to take one example but I haven't really talked about here people in the but I but you but you mentioned as an outstanding challenge for neural networks transfer learning we're learning to take knowledge across a number of previous tasks to transfer to others this is a real challenge and has always been a challenge in a neural net ok but is something that's addressed very naturally and very scalable in for example a hierarchical Bayesian model and if you look at some of the recent attempts really interesting attempts within the deep learning world to try to get kinds of transfer learning and learning to learn they're really cool ok but many of them are in some ways kind of reinventing within a neural network paradigm ideas that people you know maybe just 10 or 15 years ago developed in very sophisticated ways in let's say hierarchical Bayesian models ok and a lot of attempts to get sort of symbolic algorithm like behavior in neural networks again are really you know they're very small steps towards something which is a very mature technology in computer systems and programming languages probabilistic programs I'll just sort of advertise mostly are a way to combine the strengths of all of these approaches to have knowledge representations which are as expressive as anything that anybody ever did in the symbolic paradigm that are as flexible at dealing with uncertainty and sparse data as anything in the probabilistic paradigm but that also can support pattern recognition tools to be able to for example to do very fast efficient inference in very complex scenarios and there's a number of probably that's that that's the kind of conceptual framework there's a number of actually implemented tools I'm point two here on the slide a number of probablistic programming languages which you can go explore for example there's one that was developed in our group a few years ago almost 10 years ago now called church which was the antecedent of some of these other languages built on a functional programming course a church is a probablistic programming language built on the lambda calculus or really in Lisp basically but there are many other more modern tools especially if you are interested in neural networks there are tools like for example pyro or prob torch or Bayes flow that try to combine all these ideas in a or for example Jen here which is a project of the Koch men's singles probably the computing group these are all things which are just in the very beginning stages very very alpha but you can find out more about them online or by writing to their creators and I think this is a this is a very exciting place where the convergence of a number of different AI tools are happening and when and this will be absolutely necessary for making the kind of architecture that I'm talking about work another key idea which we've been building on in our lab and I think again many people are using some version of this idea but maybe a little bit different from the way we're doing it is what what version of this idea that I'd like to talk about is what I call the game engine in the head so this is the idea that it's really what the programs are about when I talk about problems tick programs I haven't said anything about what kind of programs we're using we're just basically these probablistic programming languages at their best and Church the language that that was developed by Noah Goodman and Vikash and others and Dan Roy and our group some 10 years ago was intended to be a turing-complete probabilistic programming language so any probability model that was computable or for whose inferences conditional inferences are computable you could represent in these languages but that that leaves completely open what what I'm actually gonna what what kind of proto I'm gonna write to model the world and I've been very inspired in the last few years by thinking about the kinds of programs that are in modern video game engines so again I'm probably most of you are familiar with these but if you're and increasingly they're playing a role in all sorts of ways an AI but these are tools that were developed by the video game industry to allow a game designer to make a new game with without having to do most of in some sense many must have the hard technical work bison from scratch but rather to focus on the characters the world the story okay the things that are more interesting for designing a novel game in particular we if we want a player to explore some so new three-dimensional world but to have them be able to interact with the world in real time and to render nice looking graphics in in real time in an interactive way as the player moves around and explores the world or if you want to populate the world with non-player characters that will behave in a even vaguely intelligent way okay game engines give you tools for doing all of this without having to write all of graphics from scratch or all of physics the rules of physics from scratch so what are called game physics engines and in some sense are a set of principles but also hacks from Newtonian mechanics and other areas of physics that allow you to simulate plausible looking physical interactions in very complex world very approximately but very fast there's also what's called game AI which are basically very simple planning models so let's say I want to have an AI in the game that is like unguarded that gardens of base and a player is gonna attack the space so back in the old Atari days like when I was a kid you know the guards would just be like random things that would fire missiles kind of randomly in random directions at random times right but let's say you want a guard to be a little intelligent so to actually look around him oh and I see the player and then to actually start shooting at you and to even maybe pursue you so that requires putting a little AI in the game and you do that by having basically simple agent models in the game so what we think and some of you might think this is crazy and some of you might think this is very natural idea I get both kinds of reactions what we think is that these tools of you know past approximate renderers physics engines and sort of very simple kinds of AI planning are an interesting first approximation to the kinds of common-sense knowledge representations that evolution has built into our brains so when we talk about the cognitive core or how do babies start what's what you know ways in which a baby's brain isn't a blank slate one interesting idea is that it starts with something like these tools and then wrapped inside a framework for probabilistic inference that's what we mean by promising programs that can support many activities of common sense perception and thinking so I'll just give you one example what we call this intuitive physics engine okay so this is work that we did in our groups that Pete Battaglia and Jess Hamrick did started this work about five years ago now where we showed people you know in some sense and this is this is also an illustration of a kind of experiment that you might do what you might keep talking about science like I'll show you now a couple of experiments right so we would show people simple physical scenes like these blocks world scenes and ask them to make a number of judgments and the model we built does it basically a little bit of probabilistic inference in a game style physics engine it perceives the physical state and imagines a few different possible ways the world could go over the next one or two seconds to answer questions like will the stack of blocks fall or if they fall how far will they fall or which way will they fall or what would happen if say one of the colored one color of blocks are one material like the green stuff is ten times heavier than the gray stuff or vice versa how will that change the direction of fall or look at those red and yellow stack blocks some of which look like they should be falling but aren't so why can you infer from the fact that they're not fall in that one color block is much heavier than the other let me show you a sort of a slightly weird task it's in a behavioral experiment sometimes we we do weird things so that we can test ways in which you use your knowledge that you didn't just you know learn from pattern recognition but use it to do new kinds of tasks that you'd never seen before so here's a task which you know many of you have maybe seen me talk about these things so you might have seen this task but probably only if you saw me give a talk around here before we call this the red yellow task and again we'll make this one interactive so imagine that the blocks on the table are knocked hard enough to bump the tables bumped hard enough to knock some of the blocks onto the floor so you tell me is it more likely to be red blocks or yellow blocks what do you say red okay good how about here yellow good how about here uh-huh here here okay here here okay so you just experience for yourself what it's like to be an objective one of these experience we just did the experiment here the data is all captured on video sort of right okay you could see that sometimes people were very quick other times people were slower sometimes there was a lot of consensus sometimes there was a little bit less consensus right that reflects uncertainty so again there's a long history of studying this scientifically that you know you could but you can see something you can see the probabilistic inference at work probabilistic inference over what well I would say one way to describe it is over one or a few short low precision simulations of the physics of these scenes so here is what I mean by this I'm gonna show you a video of a game engine reconstruction of one of these scenes that simulates a small bump so here's a small bubble here's the same scene with a big bump okay now notice that at the micro level different things happen but at the cognitive or macro level that matters for common sense reasoning the same thing happened namely all the yellow blocks went over onto one side of the table and few or none of the red blocks did so it didn't matter reach of those simulations you ran in your head you'd get the same answer in this case right this is one that's very easy and high confidence and quick also you didn't have to run the simulation for very long you only have to run it for a few time steps like that to see what's gonna happen or similarly here you only have to run it for a few time steps okay and it doesn't have to be even very accurate even a fair amount of imprecision will give you basically the same answer at the level that matters for common sense so that's the kind of thing our model does it runs a few low precision simulations for a few time steps but if you take the average of what happens there and you compare that with people's judgments you get results like what I show you here the scatterplot shows on the y-axis the average judgments of people on the x-axis the average judgments of this model and it does a pretty good job it's not perfect but the model basically captures people's graded sense of what's going on in this scene and many of these others okay and it doesn't do it with any learning but I'll come back to that in a second it just does it by probabilistic reasoning over a game physics simulation now we can use and we have used the same kind of technology to capture in very simple forms really just proofs of concept at this point the kind of common-sense physical scene understanding in child in a child playing with blocks or other objects or in what might go on in a young child understanding of other people's actions what we called the intuitive psychology engine where now the probabilistic programs are defined over these kind of very simple planning and perception programs and I won't go into any details I'll just point to a couple of papers that my group played a very small role in but we provided some models which together with some infant researchers people working on both of these are experiments that that were done with 10 or 12 month infants so younger than even some of the babies I showed you before but basically like that youngest baby the one with the cat here's an example of showing simple physical scenes these are moving objects to 12 month olds where they saw a few objects bouncing around inside a gumball machine and after some point in time the scene gets occluded you'll see the scene is occluded and then after another period of time one of the objects will appear at the bottom and the question is is that the object you expected to see or not is its expected or surprising the standard way you study what infants know is by is by what's called looking time methods just like an adult if I show you something that's surprising you might look longer okay if you're bored you'll look away all right so you can do that same kind of thing with infants and by measuring how long they look at a scene you can measure whether you've shown them something surprising or not all right people have there are literally hundreds of studies if not more using looking time measures to study what infants know but only with this paper that we published a few years ago did we have a quantitative model we're able to show a relation between inverse probability in this case and surprise so things which were objectively lower probability under one of these probabilistic physics simulations across a number of different manipulations of how fast the objects were where they were when the scene was occluded how long the delay was various physically relevant variables how many objects there were one type or another infants expectations connected with this model or another paper that we published that one was was done that the experiments that were done by era note eggless and Luca bananas lab here is a study that was done just recently by sherry Lu inless spell keys lab at there at Harvard but they're part they're partners with us and CBMM which was about infants understanding of goals so this is more like again understanding of agents and intuitive psychology we're in again in very simple cartoon scenes you show an infant an agent that seems to be doing something like an animated cartoon character but it jumps over a wall or rolls up a hill or it jumps over a gap and the question is basically how much does the agent want the goal that it seems to be trying to achieve and what this study showed okay and the models here we're done by Tomer omen was that infants appeared to be sensitive to the physical work done by the agent the more work the agent did in a sense of the integral of force applied over a path the more the infant's thought the agent wanted the goal we think of this as representing what we've sometimes called the naive utility calculus so the idea that there's a basic calculus of cost and benefit you know we take actions which are a little bit costly to achieve goal states which give us some reward that's the most basic way the oldest way to think about rational intentional action and it seems that even ten-month-old understand some version of that where the cost can be measured in physical terms okay I see I'm running a little bit behind on time and and I wanted to leave some time for discussion so I'll I'll just go very quickly through a couple of other things and and Lee and happy to stay around at the end for discussion okay the what I showed you here was the science where does the engineering go so one way one thing you can do with this is say build a machine system that can look not a little animated cartoon like these baby experiments but a real person doing something and again combine physical COFF and constraints of actions with some understanding of the agents utilities that's the math of planning to figure out what they want it so look in this scene here and see if you can judge which object that the woman is reaching for so you can see there's there's a grid of four by four objects there's sixteen objects here and she's gonna be reaching for one of them raise it's gonna play in slow motion but raise your hand when you know which one she's reaching for ok so just watch and raise your hand when you know which one she wants okay so most of they're up by now alright and notice I was looking at your hands not here but went but what happened is most of the hands were up at the about the time when that gray or the one that - line shot up okay that's not human data you provided the data this is our model so our model is predicting more or less when you're able to say what her goal was okay it's well before she actually touched the object how does the model work again I'll skip the details but it does the same kind of thing that that our models of those infants did namely it but in this case it does it with a full body model from robotics so we use what's called the mu Joko physics engine which is a standard tool in robotics for planning physically efficient reaches of say a humanoid robot and we say we can give this planner program a goal object as input we can give it each of the possible goal objects as input and say plan the most physically efficient action so the one that uses like the least energy to get to that object and then we can do a Bayesian inference this is the probabilistic inference part the program is them is the MU Joker planner okay but then we can say I want to do Bayesian inference to work backwards from what I observed which was the action to the input to that program what goal was provided as input to the planner and here you can see the full array of four by four possible inputs and those bars that are moving up and down that's the Bayesian posterior probability of how likely each of those was to be the goal and what you can see is it converges on the right answer at least well it turns out to be the ground truth right answer but it's also the right answer according to what people think with about the same kind of data that people took now you might say well okay I'm sure if I just wanted to build a system that could detect what somebody was reaching for I could generate a training data set of this sort of scene and train something up to analyze patterns of motion but again because the engine in your head actually does something we think more like this it does what we call inverse planning over a physics model it can apply to much more interesting scenes that you haven't really seen much of before so take the scene on the left right where again you see somebody reaching for one of a four by four array of objects but what you see is a strange kind of reach can you see why he's doing that strange reach up there it's a little small but what is you can see that he's reaching over something right it's actually a pane of glass right you see that and then there's this other guy who's helping him who sees what he wants and hands the thing he wants so how does the firt the guy in the foreground see the other guy's goal how does he and for his goal and know how to help him and then how do we look at the two of them and figure out who's trying to help who or that in a scene like this one here that it's not somebody trying to help somebody but rather the opposite okay so here's a model on the left of how that might work right and we think this is the kind of model needed to tackle this sort of challenge here right basically it's a model it's a we take this model of planning sort of maximal expected utility planning which you can run backwards but then we recursively nest these models inside each other so we say an agent is helping another agent if this agent is acting apparently to us seems to be maximizing an expected utility that's a positive function of that agents expectation about another agents expected utility and that's what it means to be a helper hindering is sort of the opposite if one seems to be trying to lower somebody else's utility okay and we've used these same kind of models to also describe infants understanding of helping and hindering in a range of scenes I'll just say one last word about learning because everybody wants to know about learning and and the the key thing here and it's definitely part of any picture of AGI but the thought I want to leave you on is really about what learning is about ok I'll be just a few more slides and then I'll stop I promise none of the models I showed you so far really did any learning they certainly didn't do any task specific learning ok we set up a probable state program and then we let it to inference now that's not to say that we don't think people learn to do these things we do but the real learning goes on when you're much younger right everything I showed you in basic form even a one-year-old baby can do ok the basic learning goes on to support these kinds of abilities not that there isn't learning beyond one year but the basic way you learn to say solve these physics problems is what goes on in your baton in the brain of a child between 0 and 12 months so this is just an example of some phenomena that come from the literature on infant cognitive development these are very rough timelines you can take pictures of this if you like this is always a popular slide because it really is quite inspiring I think and I can give you lots of literature pointers but I'm summarizing in very broad strokes with big error bars what we've learned in the field of infant cognitive development about when and how kids seem to have to at least come to certain understand of basic aspects of physics so if you really want to study how people learn to be intelligent a lot of what you have to study are kids at this age you have to study what's already in their brain at zero months and what they learn and how they learn between four six eight ten twelve and so on and on up beyond that okay now well effectively what that amounts to we think is if what you're learning is something like a let's say an intuitive game physics engine to capture these basic abilities then what we need if we're gonna try to reverse-engineer that is what we might think of as a program learning program if your knowledge is in the form of a program then you have to have programs that build other programs right this is what I was talking about the beginning about learning as building models of the world or ultimately if you think what we start off with is something like a game engine that can play any game then what you have to learn is the program of the game that you're actually playing or the many different games that you might be playing over your life so think of learning as like programming the game engine in your head to fit with your experience and and to fit with the possibilities that you seem like you can take now this is what you could call the hard problem of learning if you come to learning from say neural networks or other tools and machine learning right so what makes machine makes most of machine learning go right now and certainly what makes neural network so appealing is that you can set up a basically a big function approximator that can approximate many of the functions you might want to do in a certain application or task but in a way that's end-to-end differentiable and with a meaningful cost function so you can have one of these nice optimization landscapes you can compute the gradients and basically just roll downhill until you get to an optimal solution but if you're talking about learning as something like search in the space of programs we don't know how to do anything like that yet we don't know how to set this up as any kind of a nice optimization problem with any notion of smoothness or gradients okay rather what we need is a instead of learning as like rolling downhill effectively right a process which just if you're willing to wait long enough you know some you know simple algorithm will take care of think of what we call the idea of learning as programming there's a popular metaphor in cognitive development called the child of scientists which emphasizes children as active theory builders and children's play as a kind of kind of casual experimentation but this is the algorithmic complement to that what we could call the child as or around MIT will say the child is hacker but the rest of the world if you say child is hacker they think of something someone who breaks into your email and steals your credit card numbers we all know that hacking is you know making your code more awesome right if your knowledge is some kind of code or legal library of programs then learning is all the ways that a child hacks on their code to make it more awesome that more awesome can mean more accurate but it can also mean faster more elegant more transportable to other applications or their tasks more explainable to others maybe just more entertaining okay children do all of them have all of those goals and learning and the activities by which they make their code more awesome also correspond to many of the activities of coding alright so think about all the ways on a day-to-day basis you might make your code more awesome all right you might tune you might have a big library of existing functions with some parameters that you can tune on a data set that's basically what you do with backprop or stochastic gradient descent in training a deep learning system but think about all the ways in which you might actually modify the underlying function so write new code or take old code from some other thing and map it over here or make a whole new library of code or refactor your code to some other you know some other basis for that that will work more robustly and be more extensible or transpiling or compiling right or even just commenting your code or asking someone else for their code ok again these are all ways that we make our code more awesome and children's learning has analogs all of these that we would want to understand as an engineer from an algorithmic point of view so in our group we've been working on on various early steps towards this and again we don't have anything like program writing programs at the level of children's learning algorithms but one example of something that we did in our group which you might not have thought of being about this but it's definitely the AI work we did that got the most attention in the last couple of years from our group we had this paper that was in science it was actually on the cover of science sort of just hit the market at the right time if you like and it got about a hundred times more publicity than anything else I've ever done which is partly a testament to the really great work that Brendan Lake who was the first author did for his PhD here but much more so just about the hunger for AI systems at the time when we published this in 2015 and we built a machine system that the way we described it what doing human level concept learning four simple concept very simple visual concepts these handwritten characters in many of the world's alphabets for those of you who know the famous Emnes data set in the data set of handwritten digits 0 through 10 or 30 through 9 sorry that drove so much good research in deep learning and pattern recognition it did that not because Jana Kuhn who put that together or Geoff Hinton who did a lot of work on deep learning with M Nez they were interested fundamentally in character recognition that they saw that as a very simple testbed for developing more general ideas and similarly we did this work on getting machines to do what we kind of one-shot learning of generative models also to develop more general ideas we saw this as learning very simple little mini probabilistic programs in this case what are those programs they're the programs you use to draw a character so ask yourself how can you look at any one of these characters and see in a sense how somebody might draw it the way we tested this in our system was this little visual Turing test where we showed people one character in a novel alphabet and we said draw another one and then we compared nine people like say on the left and nine samples from our machine say on the right and we said we asked other people could you tell which was the human drawing another example or imagining another example in which was the machine and people couldn't tell when I said ones on the left ones on the right I don't actually remember and on different ones you can see if you can tell it's very hard to tell can you tell which is for each one of these characters which new set of examples were drawn by a human versus a machine here's the right answer and probably you couldn't tell the way we did this was by assembling a simple kind of program learning program right so we basically said when you draw a character you're assembling strokes and sub strokes with goals and sub goals that produce ink on the page and when you see a character you're working backwards to figure out what was the program the most efficient program that did that so you're basically inverting a probabilistic program doing Bayesian inference to the program most likely to have generated what you saw this is one small step we think towards being able to learn programs to being able to learn something ultimately like a whole game engine program the last thing I'll leave you with is just a pointer to sort of work in action right so this is some work being done by a current PhD student who works partly with me but also with armando salar Lezama and cecil this is kevin Ellis it's an example of what's now I think again a urging exciting area and AI well beyond anything that we're doing is the is combining techniques from where amando comes from which is the world of programming languages not machine learning or AI but tools from programming languages which can be used to automatically synthesize code okay with the machine learning toolkit in this case a kind of Bayesian Men and a minimum description length idea to be able to make again what is really one small step towards machines that can learn programs by basically trying to efficiently find the shortest simplest program which can capture some data set so we think by combining these kinds of tools in this case let's say from Bayesian inference over programs with a number of tools that have been developed in other areas of computer science that don't look anything or haven't been considered to be machine learning or AI like programming languages it's one of the many ways that going forward we're gonna be able to build smarter more human-like machines so just to end then what I've tried to tell you here is taught first of all identify the ways in which human intelligence goes beyond pattern recognition to really all these activities of modeling the world okay to give you a sense of some of the domains where we can start to study this in common sense scene understanding for example or you know something like one-shot learning for example like what we were just doing there or learning is programming the engine in your head okay and to give you a sense of some of the technical tools probabilistic programs program synthesis game engines for example as well as a little bit of deep learning that bringing together we're starting to be able to make these things real okay now that's the science agenda and the reverse engineering agenda but think about for those of you who are interested in technology what are the many big AI frontiers that this opens up so the one I'm most excited about is this idea which is which I've highlighted here in our big research agenda this is one I'm most excited about to work on for the you know it could be the rest of my career honestly but it's really what is what is the oldest and maybe the best dream of AI researchers of how to build a human-like intelligence system a real a GI system it's the idea that Turing proposed when he proposed the Turing test or Marvin Minsky proposed this at different times in his life or many people have proposed this right which is to build a system that grows into intelligence the way a human does that starts like a baby and learns like a child tried to show you how we're starting to be able to understand those things what a baby's mind starts with how children actually learn and looking forward we might we might imagine that someday we'll be able to build machines that can do this I think we can actually start working on this right now and we're and that's something that we're doing in our group so if that kind of thing excites you then I encourage you to work on it maybe even with us or if any one of these other activities of human intelligence excite you I think taking the kind of science-based reverse engineering approach that we're doing and then trying to put that into engineering practice it's it's this is this is a this is not just a possible route but I think it's it's quite possibly the most valuable route that you could work on right now to try to actually achieve at least some kind of artificial general intelligence especially the kind of intelligence AI system that's going to live in a human world and interact with human there's many kinds of AI systems that could live in worlds of data that none of us can understand or will ever live in ourselves but if you want to build machines that can live in our world and interact with us the way we are used to interacting with other people then I think this is a route that you should consider okay thank you [Applause] hi there so early in the talk you expressed some skepticism about whether or not industry would get us to understanding human level intelligence it seems that there's a couple of trends that favor industry one is the industry is better than that academia accumulating resources and plowing back into the topic and it seems at the moment we've got a bit of brain drain going on form academia into industry and that seems like a on going trend yeah if you look at something like learning to fly or learning to fly into space then it looks like a story is one of Industry kind of taking over the field and going off on its own yeah a little bit academia academics still have a role but industry kind of dominates so yes is industry going to overtake the field you think well that's a really good question and it's got several good questions packed into one there right I didn't mean to say I didn't this wasn't meant to say go academia bad industry right what I was taught what I what I tried to say was the approaches that are currently getting the most attention in industry and they're really because they're really the most valuable ones right now for the short term you know any industry is really focused on what it can do what are the value propositions on basically a two year time scale at most I mean if you ask say Google researchers to take the most prominent example it's pretty much what they'll all tell you okay maybe maybe things that might you know pay off initially in two years but maybe take five years or more to really develop but if if you can't show that it's gonna do something practical for us in two years in a way that matters for our bottom line then it's not really worth doing okay so what when we say what I'm talking about is the technologies which right now industry sees as meeting that specification and what I'm saying is right now I think those that's that's not where the route is to something like human-like but not the most valuable promising route to human-like kinds of AI systems all right but I hope that like in the cases you said you know the basic research that we're doing now will be successful enough that it will get the attention of industry when the time is right but I think so you know I mean I hope at some point you know it won't it will only at least the engineering side will have to be done in industry not just in academia but you're also pointing to issues of like brain drain and other things like that but I think it's these are real issues confronting our community I think everybody knows this and I'm this will come up multiple times here which is you know I think we have to find ways to even now to combine the best of the idea of the energy and the resources of academia and industry if we want to keep doing basically something interesting right if we will if we just want to redefine AI to be well whatever people currently call AI but scaled up well then then then fine forget about it and or if we just want to say let me and people like me do what we're doing at what industry would consider a snail's pace on toy problems okay fine but if but if we want to if you know if I want to take what I'm doing to the level that that will really be you know paying off that level the industry can appreciate or just that really has technological impact on a broad scale right or I think if industry wants to take what it's doing and really build machines that are actually intelligent right our machine learning that actually learns like a person then I think we need each other now and not just in some point in the future so this is a general challenge for MIT and for everywhere and for Google I mean we just spent a few days talking to Google about exactly this issue that this was a talk I prepared partly for that purpose so we wanted to raise those issues and and it's just I mean really there I don't know what I mean well rather I can think of some solutions to that problem of what you could call brain drain from the academic point of view or what you could call just narrowing in into certain local minima in the industry point of view but they will require the leadership of both academic institutions like MIT and companies like Google being creative about how they might work together in ways that are a little bit outside of their comfort zone I hope that will start to happen including at MIT and at many other universities and at companies like Google and many others and I think we need it to happen for the health of all parties concerned okay thank you very much things I'm curious about sort of the premise that you gave that one of the big gaps missing at determining intelligence is the fact that we need to teach machines how to recognize models and I'm curious as to what you think sort of non goal oriented cognitive activity comes into play they're things like feelings and emotions and and y-you don't think that might not necessarily be like that the no I'm I was born in questo the only reason emotions didn't appear on my slide is because there's a few reasons but the slide is only so big I wanted the font to be big readable for such an important slide I've had versions of my slide in which I do talk about that okay it's not that I think feelings or emotions aren't important I think they are important and I used to not have many insights on it about what to do about them but actually partly based on some of my colleagues here at MIT BCS Laura Schultz and Rebecca Saxe two of my cognitive colleagues in who I work closely with they've been starting to do research on how people understand emotions both their own and others and we've been starting to work with them on computational models so that's actually something I'm actively interested in and even working on but I would say and again for those of you who study emotion to know about this actually you're gonna have Lisa coming in right oh so she's gonna basically say a version of the same thing I think the deepest way to understand she's one of the world's experts on this the deepest way to understand emotion is very much based on our mental models of ourselves of the situation we're in and of other people right think about for example all of the different I mean if you you know if you think about it I mean again Lisa will talk all about this but if you think about emotion as just a very small set of what are sometimes called basic emotions like being happy or angry or sad or you know those are a small number of them right there's usually only few right you might not say you might see that it's somehow like very basic things that are opposed to some kind of cognitive activity but think about all the different words we have for emotion right for example think about an a famous cognitive emotion like regret what does it mean to feel regret or frustration right just to know both for yourself when you're not just feeling kind of down or negative but you're feeling regret that that means something like I have to feel like there's a situation that came out differently from how I hoped and I realize I could have done something differently right so that means you have to be able to understand you have to have a model you have to be able to do a kind of counterfactual reasoning and to think oh if only I had acted to differ way then I can predict that the world would have come out differently and that's the situation I wanted but instead it came up this other way right or think about frustration again that requires something like understanding okay I've tried a bunch of times I thought this would work but it doesn't seem to be working maybe I'm ready to give up though those are all those are those are very important human emotions we have to understand to understand ourselves we need that to understand other people to understand communication but those are all filtered through the kinds of models of action that I was just the ones I was talking about here with these say cost-benefit analyses of action so what I'm so I'm just trying to say I think this is very basic stuff that will be the basis for building I think better engineering style models of the full spectrum of human emotion beyond just like well I'm feeling good or bad or scared okay and if I think when you see Lisa she will in her own way say something very similar interesting thanks yeah thanks Josh for your nice talk so all is about human cognition and try to build a model to mimic those cognition but you don't how much could help you to understand how the circuit implement those things hmm I mean like these circuits in the brain yeah yeah that's the is that what you work on by any chance is that what you work on by any chance yeah yeah yeah so so in the Center for brains minds of machines as well as in brain and cognitive science yeah we I have a number of colleagues who study the actual hardware basis of this stuff in the brain and that includes like the large-scale architecture of the brain say like what Nancy kanwisher Rebecca Saxe studied with functional brain imaging or the more detailed circuitry which usually requires recording from say non-human brains right at the level of individual neurons and connections between neurons all right so I'm very interested in those things although it's not mostly what I work on right but I would say you know again liking in many other areas of science certainly in neuroscience the kind of work I'm talking about here in a sort of classic reductionist program sets the target for what we might look for like if I if I just want to go I mean I I would I would I would assert right or my working conjecture is that if if you do the kind of work that I'm talking about here it gives you the right targets or gives you a candidate set of targets to look for what are the neural circuits computing right whereas if you just go in and just say poking around in the brain or have some idea that what you're gonna try to do is find the neural circuits which underlie behavior without a sense of the computations needed to produce those behaviors I don't I think it's gonna be very difficult to eat to know what to look for and to know when you've found even viable answers so I think that's you know that's the standard kind of reductionist program but it's not that's it's not I also think it's it's not one that is I'm divorced from the study of neural circuits it's also one if you look at the broad picture of reverse engineering it's one where we're neural circuits and understanding the circuits in the brain play an absolutely critical role okay I would say the mate as an when you look at the brain at the hardware level as an engineer I'm mostly looking at the software level right but when you look at the hardware level there are some remarkable properties one remarkable property again is you know how much parallelism there is and in many ways how fast the computations are okay neurons are slow but the computations intelligence are very fast so how do we get elements that are in some sense quite slow in their time constant to produce such intelligent behavior so quickly that's a great mystery and I think if we understood that it would have payoff for building all sorts of you know Apple basically application embedded circuits okay but also maybe most important is the power consumption and again many people have-have have noted this right if you look at the power consumption the power that the brain consumes like what did I eat today okay almost nothing um my daughter who's again she's doing an internship here she literally yesterday all she ate was a burrito and yet she wrote 300 lines of code for her internship project on really cool computational linguistics projects so somehow she turned a burrito into you know a model of child language acquisition okay but how did she do that or how do any of us do this right um we're if you look at the power that we consume when we simulate even a very very small chunk of cortex on our conventional hardware or we do any kind of machine learning thing we have systems which are very very very very far from the power of the human brain computationally but in terms of physical energy consumed way way past what any individual brain is doing so how do we get circuitry of any sort biological or just any physical circuit to be as smart as we are with as little energy as we are this is this is a huge problem for basically every area of engineering right if you want to if you want to have any kind of robot the power consumption is a key bottleneck same for self-driving cars if we want to build AI without contributing to global warming and climate change let alone use AI to solve climate change we really need to address these issues and the brain is a is a huge guide there right I think there are some people who are really starting to think about this how can we say for example build somehow brain inspired computers which are very very low-power but maybe only approximate so I'm thinking here of Joe Bates I don't know if none of you know Joe he's he's been around MIT and other places for quite a while can I tell them about your company so so Joe has a start-up in Kendall Square called singular computing and they have some very interesting ideas including some actual implemented technology for low power approximate computing in a sort of a brain like way that might lead to possibly even like the ability to build something this is Joe's dream to built in this about the size of this table but that has a billion course a billion cores and runs on a reasonable kind of power consumption I would love to have such a machine if anybody wants to help Joe build it I think he'd love to talk to you but that's it's one of a number of ideas I mean Google X people are working on similar things probably most of the major chip companies are also inspired by this idea and I think even if you don't didn't think you were interested in the brain if you want to build the kind of AI were talking about and run it on physical Hardware of any sort and understanding how the brain circuits compute what they do what what I'm talking about with as little power as they do I don't know any better place to look it seems like a lot of the improvements in AI have been driven by increasing like computational power yeah how far you would you say me like GPUs or CMU yeah yeah how far would you say we are from hardware that could run a general artificial intelligence of the kind that I'm talking about yeah I don't know I'll start with a billion cores and then we'll see I mean I I think we're I think we're I mean I think I think there's no way to answer that question in a way that software independent I don't know how to do that right but I think that it's and and you know I don't know like when you say how far are we you mean how far am i with the resources I have right now how far am i if if Google decides to put all of its resources at my disposal like they might if I were working at deepmind I don't know the answer to that question I but I think the I think what we can say is this um individual neurons I mean again this goes back to another reason to study neural circuits um if you look at what we currently call neural networks in the AI side the model of a neuron is this very very simple thing right individual neurons are not only much more complex but have a lot more computational power it's not clear how they use it or whether they use it but I think it's just as likely that a neuron is something like a rail you write is that a neuron is something like a computer like under one neuron in your brain is more like a CPU node okay maybe and thus the ten billion or trillion you know the large number of neurons in your brain I think it's like 10 billion cortical pyramidal neurons or something might be like 10 billion cores okay for example that's at least as plausible I think to me as any other estimate so and I think so I think we're on the definitely on the underside with very big error bars so I completely agree that or if this is what you might be suggesting and may you know going back to my answer to your question I don't think we're gonna get to what I'm talking about that anything like a real brain scale without major innovations on the hardware side and you know it's it's interesting that what drove those innovations in that support current a I was mostly not AI it was the video game industry I'm when I point to the video game engine in your head that's a similar thing that was driven by the video game industry on the software side I think we should all play as many video games as we can and contribute to the growth of the video game industry because no because I mean I mean you can see this in very like there are companies out there for example there's a company called improbable which is a London company London based startup a pretty sizable start-up at this point which is building something that they call spatial OS which is it's a it's not a it's not a hardware idea but it's a kind of software idea for very very big distributed computing environments to run much much more complex realistic simulations of the world for much more interesting immersive permanent videogames I think that's one thing that might hopefully that will lead to more fun new kinds of games but that's one example of where we might look to that industry to drive some of the you know just computer systems really hardware and software systems that we'll take we'll take our game to the next level just understanding on the algorithmic level or cognitive level is just to understanding the learning the meaning of learning would be how to predict but on the circuit level is different but at the what level on the circuit level well of course it's different right but already you I think you made a mistake there honestly like you said the cognitive level is learning how to predict but I'm not sure what you mean by that there's many things you could mean and are what our cognitive science is about is learning which of those versions like I don't think it's learning how to predict I think it's learning what you need to know to plan actions and to a map you know all those things like it's not just about predicting it's because there are things we can imagine so that you would never predict because there never happen unless we somehow make the world different so generalizations are you're not predicting okay when your model could generalize but especially in the transfer learning that you are interested in a few hundred of neurons in prefrontal cortex they have generalize a lot yes but not kind of a Bayesian model do that you said but a thean model won't do that or they don't do it the way a Bayesian model does for sure because that's in the abstract level well I mean how do you really know like and what does it mean to say that some neurons do it like so maybe another way to put this is to say look we have a certain math that we use to capture these you could call it abstract I call it software level abstractions right I mean all engineering is based on some kind of abstraction but you might have a circuit level abstraction a certain kind of hardware level that you're interested in describing the brain at and I'm mostly working out or starting from a more software level of abstraction right they're all distractions we're not talking about molecules here right we're talking about some abstract notion of maybe a circuit or of a program okay right now it's a really interesting question if I look at some circuits how do I know what program they're implementing right if I look at the circuits in this machine could I tell what program they're implementing well maybe but certainly it would be a lot easier if I knew something about what programs they might implementing before I start to look at the circuitry if I just looked at the circuitry without knowing what a program was or what programs the thing might be doing or what kind of programming components would be mapable to circuits in different ways right I don't even know how to begin to answer that question so I think you know we've made some progress at understanding what neurons are doing in certain low-level parts of sensory system and certain parts of the motor system like primary motor cortex like basically the parts of the neurons that are closest to the inputs and outputs of the brain right where we don't eat when you can say we don't need the kind of software abstractions that I'm talking about or where we sort of agree on what those things already are so we can make enough progress on knowing what to look for and how to how to know when we found it but if you want to talk about flexible planning things that are more like cognition that you know go on in prefrontal cortex right I this point I don't I don't think that just by recording from those neurons we're gonna be able to answer those questions in a meaningful engineering way a way that that any engineer software a hardware whatever could really say yeah okay I get it I get those insights in a way that I can engineer with and that's what my goal is right so my goal that's my goal to do at the software level the hardware level or the entire systems level connecting them and I think that you know we can do that by taking what we're doing and bringing into contact with people studying neural circuits but I don't think you can you can leave this level out and just go straight to the neural circuits and I think the more you have the more progress we make the more we can help people who are studying at the neural circuit level and they can help us address these other engineering questions that we don't really have access to like the power issue or the speed issue thank you okay thanks that was great I thought maybe it'd give Jessica Han [Applause] [Applause]
Info
Channel: Lex Fridman
Views: 172,489
Rating: undefined out of 5
Keywords: mit, artificial intelligence, artificial general intelligence, human-level intelligence, robotics, deep learning, machine learning, free, open, josh tenenbaum, brain and cognitive science
Id: 7ROelYvo8f0
Channel Id: undefined
Length: 95min 9sec (5709 seconds)
Published: Thu Feb 08 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.