The Thousand Brains Theory

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Very interesting theory on neocortex connectability @ 14:00

👍︎︎ 2 👤︎︎ u/RAlex1828 📅︎︎ Mar 26 2019 🗫︎ replies

This looks extremely interesting... I'll be listening to this on my commute home from work. Thanks for sharing!

👍︎︎ 1 👤︎︎ u/TheRealGreenArrow420 📅︎︎ Mar 25 2019 🗫︎ replies
Captions
you [Music] well it gives me great pleasure today to welcome to Microsoft Research super dye and Jeff Hawkins and so good I am then Jeff Hawkins our have been very gracious to come down all the way and make a very precise visit here and I think if I start introducing them we will be here for a long time and go through it going through all their accomplishments but I will just touch upon the main point starting with Jeff and Jeff hasn't had a lifelong interest in neuroscience he created Numenta and it's focused on neuro cotton Theory I think you all probably know that he invented Kabam pilot which is just incredibly cool and ever since that time he has been focusing on neuroscience and doing great things here and in 2002 he founded the Redwood Neuroscience Institute where he served as director for three years and you're going to be incredibly honored to hear about his thoughts on neuroscience and then we have supertype is the vice president of research at Numenta and he's going to talk to us today about our deep learning vision and he has been an entrepreneur and his detailed theory of the neural cortex and he holds a PhD in computer science from UIUC and we are going to have lots of fun so thank you very much all right great thanks des well hope we have fun so we always try to have fun just a couple words about how this meeting came about soon between I super Terry and I were in Seattle back in December we were we spent the day at the Allen Brain Institute where we gave a talk and at the end of that day through a mutual acquaintance we came over here a Microsoft had dinner with such an adela and Eric Horowitz and a few other people just to talk about what's going on in brain theory and after that dinner we said hey maybe it would make sense to come and talk about our work here at Microsoft Research so that's the genesis of why we're here and so let's just jump right into it the title of our talk to the thousand brains theory of intelligence a framework for understanding the neocortex and building intelligent machine weathers gonna work I'm gonna talk for half the talk and subitize gonna talk to the other half of talks are you gonna give both of us so just a word about Numenta rent is a small company we're in Redwood City in Northern California and we've been around for about 14 years we have about ten employees the way to think about it just think about is the research lab that's what we are we're a research lab it was just it's somewhat independent research lab and we have a scientific mission our scientific mission is to reverse-engineer the New York cortex and I'm going to talk about that our goal there is a biologically accurate theory this is a biological research mission this is not to look for inspiration about how the brain works not look my name's when I'd interested and theories inspired by the baby we really want to stay out of the neurons in the brain work we test our theories via empirical data and simulation we don't do experimental work ourselves but we do that with collaboration of others and published data and everything we do is open it and publish so everything I'm going to present here today it's been published in peer-reviewed journals a few exceptions that haven't been published yet and but we don't hold anything back and there's no secrets of what we do we have a second mission which is really second in both in priority and and that is the following is that to take what we've learned from your cortical theory and apply it to AI now when I got into this field almost 40 years ago I felt immediately back then that if we're ever gonna build truly intelligent machines we have to understand how the brain works and I felt that this was really the path there I still believe that so for the last three years at Numenta we have done almost nothing on the second mission and the reason is because we were having so much success on the first mission and we put all that aside just to focus on this stuff I'm gonna tell you about that but in the last few months sympathised been re-engaging on this sort of the machine learning side of our mission statement and he's gonna talk about the recent progress we're starting to make there okay so that's who we are if just to remind you then your cortex is about 75% of a human brain by by area and the other 25% are things I like the autonomic nervous system controlling heart rate and breathing various basic behaviors reflex reactions or even things like walking and running or controlled by old brain your emotions are in the old brain but the the neocortex is dominates our brain it's the organ of intelligence so anything you think about anyway or from our perceptions to language of all different types mathematics engineering science art literature everything that you guys are paid to do here comes out of your neocortex and that's you know my spiel Court Justice speaking yours is listening so that's the the argument we want to understand the first question you might think about is what does the new your cortex do and this is it took me really a while to totally understand this completely some people think I'll go get some inputs from the sensors and then you act that's not really true what then your cortex does is it builds a model the world so when you're born that structure is there but it really doesn't know anything and you have to learn I'm out of the world and your model the world is extremely complex and rich there are tens of thousands of objects you know in the world you know how they look and how they feel how they how they sound you know where these objects are located and how they interact with one another objects have behaviors and you have to learn those too so this my smartphone is an object it has behaviors there's all these complex behaviors as I manipulate the things change on it that's all stored in the neocortex even simple things like a door in this room has the behaviorist it opens and handles learns and so on you have to learn all that stuff the neocortex learns both physical and abstract things so you can literally learn how a coffee cup is and what it looks like and how it feels but you also come learn concepts related Felix democracy or something like that and things you've never been able to experience before but what it builds this very complex model of the world and why does it do that the advantage of this from an evolutionary point of view is it's a predictive model meaning it's constantly predicting what's going to happen next in the world it's constantly what you're kind of seen what you're gonna feel what you're going to hear you're not aware most of those predictions but they're happening and and because it's a predictive model this is very complex model of the world you can you can use it to generate behaviors so you may sit in this room and listen to me not really do anything for a while but I'm changing your model of the world slightly through this talk and later you might behave differently based on that model so question is how is this structure which fits in your head you know basically learn this very complex model the world now if you were to take this out of your head and iron it flat it would be about this big it's about the size it's about 1,500 square centimeters 15 inches on us on a side it's just a little bit thicker this is about two-and-a-half millimeters thick and so if you could do this and you look at the surface of it there's no demarcations you can't see what anything looks that's all like one uniform sheet and but we do know it's divided in the regions or functional regions and they're human there's a they don't really know but they estimate a little bit over a hundred different regions that your new quarters divide into and there are regions that are responsible for vision and hearing and touch and language and so on and they're connected together through the white matter in this very complex way okay so if you were if I'm just going to show you here here I've highlighted a few of these so there's a bunch of visual regions in the back of your brain here there's some auditory regions where these primary where this is where the sensory input comes in to see your eyes projected back your head your ears projected the temporal lobes your body projects your century your touch matter senses touch to this area across the top like that and the story that's told to how this works this is a conventional view which I already guess is really not correct at all but the way this classically has been viewed is you have something like your eye and your retina and it projects to this first region back here and that's somehow extract simple features and then that projects to the next region which product extracts more complex features and in the brain you do this three or four steps and then all of a sudden you have cells that represent entire objects there's sort of a a feature extraction in a hierarchy sort of paradigm it's presumed something similar is happening in the other sensory modalities also not at all clear how because you don't have this regular topology on your skin as you do in the retina but the same idea is pose ibly happening and then somehow there's a some multimodal objects occurring in all these other areas that really people don't really understand what they're doing at all so we have somewhat of an idea what's going on in these primary sensory regions are not really very well and then somehow all this other stuff is happening up there and so that's the story that's told and I cut a classic view if you get a neuro science textbook it'll talk about that here's the reality here's a picture of we don't we can't do this for humans but this is a primate as the macaque monkey this is a very famous picture in neuroscience if you don't know that this came out in 1991 by these two scientists found them in Vanessa and these rectangles here are regions of the monkeys in your cortex it's smaller than ours but it's basically the same idea and so these are all these different regions and these lines each one of these lines here represents millions of nerve fibers connecting them in various ways and this thing is really really crazy complicated and it doesn't look simple like that thing up there and we can make a few things a few observations about this the vast majority of the connections between regions here are not hierarchical at all they go all over the place so 40% of all possible connections between regions exist and many regions gets input from 10 or more other regions and since this diagram was made 30 years ago um for the 30 years ago oh not quite but we now know these even far more connections that they missed that are going across from all different places so in this diagram you have the touch sensors on the on the left and the vision regions on the right and there's some sort of presented as a hierarchy but it's not really like that at all this is what your you know our brains are like this but even more complicated so somehow this is the organ of intelligence and we want to understand all this which seems quite daunting um so the next thing we can do is look at the detailed circuitry in the neocortex this is within the two-and-a-half millimeters thickness of the neocortex and see what those what the circuitry looks like in there and this has been studied for well over 120 years now these pictures are green by qahal back in 1899 this is the two and a half millimeter thickness it's at the top of the the sorts of skull and towards the center of the brain here and what they're showing in here back in those days we'd see those these are cell types these are individual cell bodies and you can see there's sort of a stratification going on and then here they're showing that the axons and the dendrites see the the processes coming out of the cell body that connect to other neurons and you can see these are predominantly vertical so there are literally I'm not joking tens of thousands of papers have been published on the architecture in the neocortex that's represented here first back in the 1800s here's a picture that I made which is showing some of the dominant connections between here we spend a lot of time reading these papers so you can see those these different layers they label them one through six but that's really a misnomer it's a sort of a rough guide and there's these very prototypical projections that go between different layers and different cells in different layers and there's different types of connections I've indicated those here by blue and green and this is very much more much more complicated than this but this is sort of a prototypical circuitry and we can make a few general observations about this one observation there are dozens of different types of neurons and by different types I mean they have different sort of response properties different connectivity properties maybe different sort of gene expressions and so on they're roughly organized into layers most of the connections in the New York architecture across the layer so input comes into layer four and then it goes up and down and back and forth like this and there's very limited connections like horizontally there's a few layers that protect long distances but most of the information comes up and down up and down and then they spreads on a few layers at long distances one of the surprising things is that all regions in the neocortex and people didn't know this until 20 years ago so I'm have a motor output so we think about oh there's a sensory input that gets you know the part to get your input from the eyes and one and then it goes up and down the hierarchy and then you do some behavior it turns out that everywhere you look there are these cells in layer five which project someplace in the rest of the body of the brain and make movements so even in the primary visual cortex the first part that gets input from the retina has cells that project back to an old part of the brain that moves the eyes and and regions of the auditory cortex Bojack has more old parts of the brains that moved your head and so every sense in the brain is a sensory motor issue there's no pure sensory input it's sensory motor all the way every region is sensory motor now a couple of observations here this circuitry and this complexity is remarkably the same everywhere it doesn't matter if you look at reasons doing language or vision or hearing or some unknown function it looks like this now this is an area of contention because you can find differences in the different regions there are some regions to have more this cell type and must have that cell type some a little bit thicker some a little thinner some have an extra special little thing here and now but the variation between the different areas of the unit cortex is remarkably small compared to the commonality so that's an incredible observation and then nothing we can say about this is this con is a very complex circuit and it's gonna do something complex it's not gonna do a simple function this is it's the complexity there for a reason okay now how do you make sense of all this the first guy who made sort of the really big observation is this guy here Vernon mound Castle is a neurophysiologist at johns hopkins and he published this idea in 1978 in sort of a fifty page essay which is so much famous and it's one of the most incredible ideas at all time is well I put it up there with Darwin in terms of its the significance his observation he said the reason that all the regions of the neocortex look the same because they're all doing the same intrinsic function they're all doing the same thing and what makes a visual area vision in a language area language in a somatosensory areas touch is what you connect it to so you take a region the cortex and connect to its in eye you get vision if you take a reason the cortex any connected to eros you get hearing and that this that the brain got big by just replicating the same thing he then said a cortical column which is a little bit under a square millimeter I'll just call it a square millimeter but a little bit under square millimeter contains all the essential circuitry that you'll see everywhere and so a cortical column is the unit of replication and if you can understand what a cortical column does then you understand the whole thing you can visualize it like this you can say okay here's here's this your cortex and means these comps now they don't look like this you actually can't see them like this they're not stacked up like this but functionally and anatomically one can argue this is the way it is and there's a few places in some brains where you actually can see columns but mostly it's it's not it was more like hey you won't you won't see them but they really do exist so that's so now in a human we have 150,000 of these things we have 150,000 copies of the same basic circuitry there's a corollary to what Verna mal castle proposed which I came up with I don't know of anyone to ever point it out but if you actually believe what he said that every comas doing the exact same thing then by definition every column must perform the same functions that the entire neocortex does there is no other place for things to happen so if I'm going to have prediction occurring some place in the brain just gonna be occurring every column if I'm going to learn sequences and be able to play back sequences that's going to occur in every column and so on so that's you know but this is such a crazy idea that most neuroscientists it's like it's hard to imagine but that's what he proposed and turned out to be true so I'm going to jump forward here there we've done many many years of research on this and we've been piecing apart different aspects of how neurons work and how dendrites work in our synapses were and how these circuitry is in the neocortex works I'm going to skip all to that into something that happened three years ago so we had built up a sort of a base knowledge about a lot of things going on there but three years ago we had an insight which sort of blow the whole thing open and the I didn't sites started with this Gothica I was literally in my office and I was playing with this coffee cup and I asked a very simple question I said am i touching this thing with my finger and as I move my finger round I make predictions about what it's going to feel I said to myself what does the cortex need to know to predict what is gonna my fingers gonna feel when I move it and touch the lip I can imagine I can imagine that feeling what does it take the require to know that I so first of all has to know that's holding a coffee cup because different objects would lead to different predictions the second thing it needs to know need to know where the location of my finger is in a reference frame of the coffee cup relative to the coffee cup and doesn't matter where the coffee cup is to me what orientation doesn't matter but I have to know where my finger is in the reference frame the coffee cup I need them where it is I need to know where it will be after I execute them you a movement so I'm gonna I'm gonna execute a movement and while I'm executing it the brain has to predict where the new location will be and then based on the model the coffee cup it can predict what's going to sense and I very quickly realized that this is a universal property in the cortex and that every part of your skin is pretty where I'm holding this my hand like this every part of my skin is predicting what it's going to sense as it moves over this coffee cup and every part has to know what where it is in the reference frame of the coffee cup it has some sort of sense of location in the objects it's manipulating this was a new idea we then ran with this and we published our first paper on this in 2017 and we explained a lot of the detailed mechanisms of what we think is going on here I'm giving you just the highlights of it so you have this your hand and your finger touching this cup and that typically goes into layer four and the core this is the primary input layer in any region and that's your sense feature there's this other very major connection between these cells and layer six and layer four which is well-documented and it has a certain type of effect on my affords indicated by blue here and we put we propose that this is representing the location relative to the object and now now we have two things you have the sensation and the rotation relative the object then we can integrate over time and form a representation of what the object itself is and so if you think about you think this is going to represent their activities going to represent the coffee cup these cells are going to be changing based on both the sensory input and where it is on the cup and you can simply integrate over time and build a model of a cup which really a model the cup the morphology the cop is like what features are at locations and your brain has to know this we detailed this the actual mechanism how the neurons do this in quite detail I'm not going to go through that today so touch on a little bit later we then say well you got multiple columns going on here and so imagine you have different parts of your skin touching the cup at the same time where you just have one finger you have to in order to either learn the cup or infer the cup you have to move the finger up imagine reaching into a dark box and you try and recognize what this is you have to move your finger to do that but here if they reach in and grab with my hand at once I don't have to do it I can recognize a cup with one grasp and the reason that is is because there's these long-range connections in this upper layer Hillyer two three that go across large areas of the New York cortex and these represent a voting mechanism so each column is getting some different input as a guess as to what it might be feeling it doesn't really know but they vote together and they settle on what the only thing that's copacetic with the locations and the senses they have so we modeled this we show how this work we then you can make the same arguments going on in all your senses you might think vision is different but it's not the retina is best way to think about the retina it's just a topological array of sensors just like your skin is a topological topological array of sensors and each column in them in the primary visual court he's looking at a small part of the visual object nobody looks at the whole thing and sound like he's part of the visual cartridge like looking through a little strong just like it's like a fingertip and if they had to look through a store I have to move the straw around to see what I'm looking at but if I have all those columns active at once bingo I recognize the object so we walked through all the details of this but there's a big question how is it possible for neurons to establish a reference frame of an object that doesn't even know what it is and then know where is on that object how could that be done with real neurons and so that was we proposed in this paper where to find the answer and that proposal turned out to be correct so now we're going to skip to that it turns out that there are is a very well-studied thing in the brain called the internal cortex which is not part of the New York cortex it's part of the old brain and it has it creates reference frames and what we propose is that the cells the cell types that were evolved a long time ago in the internal cortex are now existing in the neural cortex so let me just tell you about grids these things are called grid cells you might have heard of them they are very famous they were first discovered by the Moser team in 2005 and what they do is they create reference frames for an environment it's been studied in rats mostly there's in blue here is the n-terminal cortex but we have it to you and I have a small internal cortex and it has these good cells in it and what it does what I mean by an environment is like this room this room is an environment so my grid cells right now have established a reference frame in this room and their activity tells me where I am in the room even if I close my eyes and I walk over here I have a sense that I'm in a different location in the room I know I've moved I know my the old spot was over there I know a little bit closer to this wall for the from that wall and so even with you without any sensory input there's an internal sort of system for tracking where you are in this room and that's what's happening so they represent they create this reference frame they represent the location of the body in the room or the environment and this is useful for building maps of rooms like where are the things in the room and for navigation like how far am i from that door and how many steps would I have to take to get there the grid cells provide a metric space for doing this and this evolved in the old part of the brain for navigation which is one of the first things animals had to do when this moving around the world that navigate our hypothesis was that grid cells also exist in the neocortex in every column and now they're creating reference frames for objects that you interact with in the world both physical and abstract objects and they represent they represent the location of that column that that cortical columns input in that reference frame and they're needed for learning the structure of objects in the world and for navigating our limbs in different parts of a body's relative those objects so we have now detailed different parts of this in a series of papers some this is a visual picture of the same thing I just told you so here's on the left side the ntarama cortex these are two rooms a rat might be in these letters represent locations in the room so when the rats in this location these cells have one activity we'll call it a when they're in a section different activity called B and different sections here and it doesn't matter how I got to see I can have maybe a co-director agency whenever the rats in that position or you're in that position that pack tippity occurs the same thing is going over here but now just going back to the fingertip you have objects like my coffee cup or a pen and there is basically a reference frame of points around this thing and I've just labeled some of them here and as you move your finger or relative to these objects those cells are changing and then representing the location where that sensory input is relics the way grid cells work is really cool it's fascinating but it's not easy to describe in a very short talk so in a longer talk I could tell you all about it but just trust me there's a really cool way it's not my Cartesian coordinates there is no there is no zero point here it's a sort of a self referential reference frame and it's in neurons do this and nature figure that had to do it's very cool ok so we then then we went back ok so we didn't know how these cells were representing location then we said we do not know they're there our grid salad the grid cells come in these modules so I call them grid cell modules so in every cortical column there are grid cell modules and then last year we published a paper which describes very in detail how these modules work related to here this interaction back and forth so how do they figure out where you are at the same time they're touching and so we worked out the detailed mechanisms for this okay I'm now going to jump forward to today and this is and today some most of this has been published but not all of it we're going to do that this year we I'm just going to tell you sort of that the complexity of what we think is going on in a cortical column every court yes yes yes super talked about this the question for those who can hear may people online is how does the brain keep track of multiple hypotheses I'm a rephrase how does the brain keep attacking multiple processes because it doesn't know where it is yet but it might have some idea where it is it could say given what I sensed I can be in these sets of things and these sorts of places right one of the discoveries we made a number of years ago and sometimes we don't talk about this so I'll just give you a clue about it everything in that all the activations in the brain are sparse meaning most of the time very few cells are active and very very few most cells are inactive and the way the brain represents uncertainty is it forms unions of patterns it activates multiple patterns simultaneously at the same time and because they're sparse it doesn't get confused surprisingly few though it's not obvious you might be right up front but the brain can actually activate multiple guesses at the same time in the same set of cells and and what you see in when do in the neural tissue when there's uncertainty you have a lot higher activity rates and when you're certainly gets very very sparse and so basically there isn't like a buffer someplace it's not a it's not a classic probability distribution it's a literally a union of hypotheses that are happening at the same time and the mechanisms we showed here show how that Union gets resolved that's an important part because we think this way the brain represents information is really critical and sometimes gonna talk about that okay I hope I answer that sufficient for now let me just just unlike anyone explained oily jumped through it we now believe is going on recom we believe there's actually two reference frames there's these two sets of cells in 6a and 6b and so it's able to and it's able to represent sort of two spaces at once what it what the column learns it learns personally when it's observing something it learns the dimensionality of the object there's no assumption about is this a one-dimensional a two-dimensional or three-dimensional or n dimensional object it can learn the dimensionality the object in that dimensionality you can learn the morphology of the object what features exist at what's points in that space it can learn the changes in morphology we detailed in this paper last year of how this could go about but like literally here's a laptop and when the lid goes down and up I have to learn that to change and morphology of this so this thing has to be able to learn those things is like the behaviors of objects it's able to learn both compositional and recursive structures so here's an example of a compositional object I have this coffee cup and there's a logo on it the logo was learned before I had a coffee cup and had a logo now I've learned a new object with coffee cup and the logo I don't want to have to relearn the logo I want to be able to sign the logo to the coffee cup and so that's a compositional object and it also comes when we cursor structures in the sense I could have a logo with a coffee cup and the coffee cup can have a logo on it and so on and by and recursive structures are essential for language with other things so this of course this system also has a motor output so it generates motor behaviors and it complied to any kind of object a physical object or an abstract object if you were a cortical column you do not know what your inputs represent and you actually don't know what your motor outputs represent you're just this bunch of neural tissue up there and you're trying to figure out how to model the input space that you're given given some ability to generate some behaviors and so if you attach this to a part of the visual and the retina you'll learn very simple visual objects here and if you attach it to some output of some other region you might get some very abstract things okay so we're just gonna leave it at the moment you have a hundred fifty thousand copies of this in your brain so how does this all come back together again let me go back and this is where the title of my talk is a thousand brains theory of intelligence I mentioned earlier the classic view of a hierarchical feature extraction model which is underlying basic a convolutional neural net which these days but the cranium is not like the brain has some hierarchical instructions too but I mentioned earlier that most the connections are not that way so here what we're going to say over here that this is the alternative view is instead of having we have all these columns and they're all doing the same thing all doing this complex feat this modeling and if I sense a cup like I touch it and I look at it maybe at the same time you're gonna invoke a whole bunch of models at different levels in these hierarchy all about the cups now these models are not identical they vary in different ways well of course the ones on the left here all gonna be so division related the ones on the right here then we go somatosensory a touch related but here they're gonna be on different parts of your retinal space Dib charger you know so there'd be some models on this part of the retina built on this part of the retina there also there even in any of these modown these are like given parts of your skin there is even here you might have some models built on color primarily other than orders different types of black and white here you might have some model to have more an impact of a temperature which we've been you know tell you the material surface others may not it doesn't really matter you have was this this array of columns all modeling the same object and and then these long range connections which I'm just hinting out here basically allow all these models to vote even at the lowest levels even at the primary visual region in the primary somatosensory region we find these connections going across which make no sense in a hierarchical model but they make sense here everybody's trying to like guess what's going on and so this allows you to resolve ambiguity it allows you to do inference much faster without movement and it's why you have a single percept of the world you may have thousands of models of the object being observed at the same time but what you're perceiving is the sort of crystallization across this upper layer which says yeah we're all agree now this is a coffee cup and lots of people are contributing to this right in the moment but doesn't matter and these different models can come in and out based on obscuring of data and so it doesn't really matter we all know it's a coffee cup so that's the that's what we call this a thousand brain series of intelligence my last slide here is the following and then I turn over the super time so the question is will these principles be essential for the future of AI we care doesn't matter that how brains work where's a lot of success going on in the air right now and and most of these things some of them but most of these things I just talked about are not part of that well as I said earlier I believe some of these will be if we think about where AI is going and what we want that to be and how clued they our systems are today compared to what they could be so I'll just hear some things I think are absolutely essential anything the medium in the long term so not the very short term right now if we really want to get to the future of AI you're gonna have to have systems at our sensory motor they're essentially motor learning and inference you cannot learn the structure of the world about moving you can learn only a very impoverished models of the world and so we learn mostly by moving through the world I can't learn what this building is unless I move through the world you talk about you know the slam thing earlier so in my mind AI robotics are not separable they're not really separate problems they're really the same problem these models are going to be based on object centric reference frames that's very clear to me now they'll have to do the way good cells work maybe maybe not I don't know but it's gonna be worked on objects insert reference frames the way grid cells do it it's pretty cool so that might be the right way of going about it and then there were going to be many small models with voting this allows that has a lot of advantages doing this one robustness is one but it just it gives you this it's a it's it's hard imagine you can really build a true complete AI system that didn't didn't work this one in the short term or in the near term there's some things we can do and this is this is my clue into super taste talking he's gonna talk about these things right now I mentioned earlier these sparse representations that's they that's all the way brain works and that leads to a very strong robustness and the representational model and brains and in the neuron model that we use I didn't talk about that sooo but I will talk about that we we model neurons quite differently in people look at the point neurons are used in artificial neural networks the real neurons don't work like that and we're gonna talk about this but there's some of these properties of real neurons are essential for continuous and online learning and subitize going to talk about that so not that you were going to applaud but if you were you can hold it and and now when switched you have a question grounded calls I didn't tell you fly say this well it's mostly not Harlem area so far they so I'm wondering where that hierarchy comes from and are it's like a conceptual hierarchy also grounded alright so every sir you got the idea that me comms doing the same thing right right bring me out if you look at the visual regions in the neocortex the first three visual regions v1 v2 and v4 all receive direct input from the right now it's not just going to be one it goes it's it's projecting multiple levels up but not to all levels so one of the things that's going on there is that these these different regions here actually are going to be modeling objects at different spatial sizes on the retina so if I had the smallest possible thing I could see like the smallest print that I could read I'm telling you that's being recognized in v1 most people would be very surprised to hear that now in addition to that projecting over multiple regions there's convergence from v1 to v2 and from V 2 to V 4 so there's both hierarchy and non-market going on at the same time we don't have a detailed model of how the hierarchical composition works yet so fine but but that's what the outlet what it looks like so it's not completely flat it's definitely not fun but it's definitely a lot flatter than most people think and very very few people would think that even in these primary visions of the second read the primary regions of somatosensory cortex that real object recognition is going on I'll give you a couple data points you might find interesting in a mouse and mice can see pretty interesting they can do a lot of interesting manipulation and so on mice really have only v1 they have almost no v2 almost all the vision occurs in the primary visual region and in humans and monkeys we took the monkey like the the macaque I talked about earlier 25% of the entire neocortex in the monkey is v1 and v2 these regions are huge and all the other means everybody's much smaller and that too tells us that most of what we know about vision is occurring in these lower regions here so it's just moving this this idea that somehow you got these feature extractions going that some big thing up here just the opposite the big stuff is happening down here and then there's convergence as you go up here in there's more high level concepts we don't really understand exactly how let's make sure I understand the question are you saying that those differences or is that is that genetic or is it learn well let me let me try to answer that question first of all the the basic theory says the difference are really primarily just where they are in what they're getting input from okay much of that is genetically determined at birth you have this structure here like that and and and there's also at birth there are differences I mentioned that there are differences between these regions so evolution is figured out that you know in some regions I might want a little bit more of this cell at birth you have these differences but primarily it's just where they are and what the topological input is now the system is very flexible so if you were born without a retina or you don't have a you're just blind it can generally blind these regions don't atrophy it's surprising what happens is they get inverted and they start in purchase starts coming this way to them and they actually operate in backwards so there's a lot of like learning flexibility going on here but nobody in charge I'm maybe I didn't understand your question but it's it's not is it's not as it's not precise at all it's it's like you can hook these things up almost any way it'll work the question is if you hook it up a little bit different literally you might get a better result so for example it may give me another example humans the size of v1 the area of v1 one of the largest regions in the brain in normal humans it varies by a factor of three now presumably the people what I've read is that people who have larger be ones are better at high acuity low you know if I was really working very small things I'd be better at that but the people who don't they're not as good as that everyday both see they all think they're normal the life goes on but some are gonna be a little bit better than some others than others so there's a hell of a lot of flexibility in this system you can in there's all kinds of writing you know trauma you've knocked some of these things out the everything keeps working you rewire stuff they keep working that's one guy that Sargon to serve we wired the ferrets brain and it kept working yeah it's it's oh yeah you can tune it and make it better but it's kind of robust just the way you do it yeah yeah well it has an input not it's not all sensory right it could be for other columns yes yes that's right so in this case in the case of the finger literally there's a part with the part that's getting input from the fingertip projects part back down to the spinal cord which would be most likely connected to the muscles that would eventually move the finger in the retina it's a little different because the retina moves all at once and so in the retina all the comms in the visual cortex project to this one thing called the spur colliculus which moves the eye on mass so there's less of a topology there because I can't move mine you know my eyes differently didn't parts of my eyes every much so but that's certainly true with the the somatosensory system it's there's a topological arrangement to that yes over there this is only have to talk to you know I'm take this last question then what let's suppose I go we separated or is it just the convenience that we isolate it's uh it's a mostly a convenience remember this picture here I said this is not what it really looks like you don't see this there there but there's a lot of evidence that as you if you were to move a probe across the cortex even though you don't see these columns there's a lot of evidence that there is a sort of a physiological break not a physical break you can't see a dividing line but there is a physiological break between what this suction here represents and what this section here represents and so that's been documented and there's a few places like in rodents somatosensory cortex they have a very special their whiskers are a very special organ it's an active sensing organ it's almost like our fingers and so on they're rats and mice looms their whiskers in an active way and in the in the barrel cortex they call it in the in the rodent you actually literally do see columns there's one column per whisker it's it's a direct evidence that this is what's going on that the whiskers like a fingertip and then grad has an array of whiskers and it's got array of columns and those you can see but mostly it's a physiological change that going across the surface of cortex so it's real it's not just one continuous thing but you don't see it physically mostly in the cortex on my end of this maybe talk about later in the cortex there's really only two sets of layers of cells that project across columns broadly and I I'm mention that those are both voting layers remember I said there's two reference frames they're both voting so if there's a one of the layers cell types in layer five is a long-range projections and one of the cell types in layer three there's a long-range connections and those those guys are all voting and trying it because because local columns will typically be be accenting the same thing so within some area here these guys are all sensing the same thing so they this this propagation of like hey what are we all seeing happens in some broader area and so you see those connections spread over this but there but that as I said mostly circuitries within a column okay I'm going to write and subitize gonna pick up and we're here all day so people want to talk about this longer you can do it oh no you can switch the presentations in well as Jeff said we've been really focused on the neuroscience for a long time and recently we've started rien gauging on here and my talks gonna be a little bit different I'm going to focus on a couple of aspects of what we're doing in here and the basic idea here is you know how can we take what we've learned from the neuroscience and from the neocortex and apply them to practical systems in a way that we can improve some of the shortcomings and and maybe improve some of the current techniques so my talk is going to focus on a couple of different fundamental areas so Jeff didn't touch on these directly but a lot of these are underlie the theories that we're working on so the first area is robustness which is a pretty key area and as we discussed the brain is remarkably robust and resilient to noise from the outside internal faults and all sorts of things and what we think is that the way that the brain represents information using sparse patterns of activity it's pretty critical in in achieving this robustness it's going to talk about exactly how we think about that and then what we've done very recently is incorporated sparsity into deep learning networks in a way that mimics what we think is going on in the brain and I'll show you some of the results from that one the next two are more focused on the learning aspects and going to be talking about continuous learning and unsupervised learning these are big research areas and machine learning as well as as in neuroscience and it turns out that these two are actually handled in a very similar way in the brain and it's really the same mechanisms and operations that lead to both and so this is actually really one topic not not two and what we think is going on here in order to really explain that I'm going to dive into a little bit more detail about how biological neurons work and how they operate on on their inputs and how they learn and we do have although we haven't implemented this in the context of deep learning networks we do have some results on real-world data set so I'll show you how you know this concept applies there this one okay so let's dive into sparsity and robustness so this is a picture of pyramidal neuron this is the most common type of neuron in your neocortex pyramidal neurons have thousands of synapses anywhere from 3,000 actually some places up to 30,000 synapses and what's remarkable is that what's shown here is the dendritic tree where all the inputs are coming into the neuron and in a single neuron these the dendritic tree is chopped up into tiny little segments and within each segment as few as 8 to 20 the synapses can recognize a pattern and this is remarkable if you consider that thousands of neurons are sending input into it there's and these activity is extremely noisy you know how can you possibly recognize patterns robustly using such a tiny fraction of the available connection so this is something that was puzzling us for a while and we think we understand some of the combinatorics and math behind this now so I'll explain that here so here's kind of an abstract view of that and I'm gonna consider binary sparse vector matching so here's here's a kind of a stylized dendrite and here's a set of n inputs that are feeding into this dendrite now you can represent the connections on the dendrite as a sparse binary vector with n components in it and a 1 here will correspond to an actual connection with a with a neuron here and these connections are learned over time and then you can also represent the input as a sparse binary vector with n dimensions to it with a 1 being an active unit there ok what we care about here is when we're not there matches what are the errors that can happen when you have matching so if you look at the dot product between these two it counts the overlap between them and you want if the overlap is greater than some threshold we say it's recognize the pattern and we can sort of investigate that simple operation in this context and see what it looks like so here's a picture of the space of possible vectors and it looks like it turns out the combinatorics of sparse vectors is really interesting as it relates to robustness and other properties so here the gray circle here represents all possible vectors and the white circles represent individual let's say dendritic weights or you know each indirect segments so if you look at x1 for example you might want to match patterns against that now there's a parameter theta which controls how precise this match has to be and the lower the theta the more noise you can tolerate okay so if theta is really small you can tolerate all sorts of changes to the vector and you will still match the problem of course is that as you do that the risk of false positives increases so as you decrease theta the volume of the possible the set of vectors that match this white circle increases but the the space is fixed so you're going to have much higher chance of matching some other pattern that was potentially corrupted so we can count this and we can figure out exactly what this probability is so for completely uniform vectors we can calculate exactly the ratio of the white sphere to the gray sphere so the numerator counts all of the Vaali all of the patterns that will match a candidate vector and the denominator is the size of the whole space and I'm not going to walk you through the derivation of this but what's really interesting here is that as you increase the dimensionality the size of the space grows much much faster than the size of these white circles and so the ratio of these two drops very rapidly to zero and what this means is that you can maintain extremely robust noisy matches with a fairly low theta with a very small chance of false positives so it's a pretty remarkable property of these sparse representations so this graph shows a simulated version of this so here I have a let's say a dendrite with 24 synapses on it theta up 12 so you can tolerate roughly 50% noise in here and you're looking at inputs with different varying levels of activity and what this graph shows is that the chance of false positives decreases exponentially with the dimensionality so this is this ratio that I was and as long as the inputs are sparse you get extremely low error rates so here if the activity has 128 inputs and you roughly have 2000 dimensions your error rate as you know down to the 10 to the minus 8 okay so good so pretty low the other interesting thing to note is this horizontal dotted line there if the activity coming in is dense in this case about half the number of units the error does not decrease the combinatorics are not in your favor in that case so the error is more or less flat the dimensionality doesn't impact it so as long as things are sparse and high dimensional you get into this really nice regime where things are extremely robust and we wanted to see if this kind of property would hold within deep networks as well and of course with deep networks you don't work with binary vectors you work with scalar valued vectors so we want to see if the same property would would hold there and it turns out it does so this is a similar simulation except with scalar vectors and the combinatorics that I alluded to still work for scalar vectors and a dot product as long as any of the components of 0 it's not going to affect the match so those basic combinatorics are still in play however what scalar vectors the the the values the magnitude of the values are important and so in order to get this nice regime you have to make normalization is important you have to make sure that both of your vectors are roughly in the in the same range and as long as you are careful about that you get the same basic properties it's not quite as nice error rates as in the binary case but you still get error rates that decrease exponentially with dimensionality yeah so here I'm specifically focused on dot products and but you know you could imagine other functions would work the key thing is you want to ignore the zeros if either one is zero yeah so this relies on a uniform distribution of vectors and reality it's not going to be all that it so they're less uniform it is the worse these property and so I kind of flip it around and say it's kind of the job of the learning algorithm and the job of the system to try to enforce uniform entropy or maximize entropy as as much as possible yeah and it's an important point and it's hard to measure exactly in the brain but things are extreme the correlations in general and the brain are extremely low okay so how can we put this into into deep learning systems so what what I've done is created a differentiable sparse layer so on the Left I'm showing a vanilla hidden layer and in a neural network so you have some input from the layer below you have a linear wait at some of those inputs followed by rail you or some other non-linearity and the my the sparse layer that I've created is very similar to that the main differences are as follows so first of all the weight matrix instead of being dense is sparse so most of the weights are actually zero and they're maintained as zero throughout so it's as if those connection just didn't exist the second thing is there's a value is replaced by a K winners layer and what this does is just maintain the outputs of the top K units and the rest are set to zero so with rel you you just maintain anything above zero here we're maintaining only the top K units and you can treat the gradient exactly as you do what rel you it's one for the ones that are winning and zero for everything else the one it problem with that is very easy in this formulation to get a few units that win out and and stay strong and so then you don't get this kind of uniform distribution that you want so what we've included is this a boosting term that favors units with low activation frequency so there's some target level of activity that's determined by the sparsity appear layer and if some units are below that average if their average activation is below that you boost their chance of winning in the end of sorting okay but the output is not affected by the boosting term it's just which ones are chosen as the winners and this again helps maximize the overall entropy of this and we've shown this in some in some past papers so it's a very simple construction it you have sparse weights and sparse activations you can also create convolutional layers that are using the same mechanism in the results that show you I did not use sparse waits for the convolutions because the filter sizes are pretty small but in principle you could do the same exact same thing there so we've tried this on two different data sets on M NIST and on Google speech commands I'll show those results so here I'm showing one and two layer dense networks and one in two layer sparse networks the basic test scores for M missed state-of-the-art test set accuracy if you don't use data augmentation is between 98.3 and 99.4 so both of them are in that range the dense networks are a little bit better as you can see but what's really interesting is when you start testing with noisy datasets and this plot shows accuracy as you increase the level of noise and the input you can see that the sparse the network's the sparse networks do dramatically better than the then the dense networks here and here's some examples of noisy versions of amnesty images and and the dense and sparse results so here's images with 10% noise and they're still about the same here you have 30% noise and the inputs and you can see that the sparseness still does really well and the dense one doesn't and here is with 50 percent noise so okay so that what that was encouraging we want to try it on some harder datasets as well so I looked at the Google speech commands dataset so this is something that Google released a couple of years ago there's 65,000 audiences of one word phrases there's a harder than M NIST and state-of-the-art is around 95 to 97 and a half percent for for 10 categories and again I tested accuracy with noisy sounds as well and then as before so I have two different types of two layer dense networks and then two different sparse networks the basic test set accuracies are about the same for dense and sparse in this case but the noise score in here you can think of this as an area under the curve it's sort of the total number of correct classifications I'm draw all the noise levels you can see again that the sparse networks do significantly better than the dense networks as we kind of expected from from the math this it's interesting that this super sparse network is sure one where only 10% of the weights are nonzero in the in the upper layers and it's it's remarkable [Music] at least exactly yeah so in order to really get the properties you have both vectors have to be sparse so the weight vectors as well as the input vectors have to be sparse to really get the robustness properties and I don't know if you were looking at robustness you may have been just looking at test set accuracy there it's it's either one will work but if it really to get the noise robustness properties you need to have both of them both need to be sparse it's not as good yeah yeah and the 2% number is also interesting that's sort of roughly the level of sparsity see you see in a lot of areas in the brain actually so that's a interesting factor but yeah so it's remarkable how much yeah yeah so so so dropout for example so try training with dropout it hurts the sparse networks so another way to say it is you don't need to do use dropout and the dense networks is sometimes helps you have to really tune the dropout rate but in no case is it anywhere close to the sparse things so drop out as a regularizer is known to help a a little bit in it and sometimes help would test that accuracy but in terms of its if you use sparse networks you just don't need to worry about drop on okay I'm going to switch to the second topic here which is unsupervised and continuous learning and here I'm going to dive back into the neuroscience briefly so here's our favorite neuron again the pyramidal neuron it's got about 3,000 to 10,000 synapses as I mentioned and these neurons have a very complicated kind of dendritic morphology here and they have different kind of functional properties in the different areas so in this green area near the the center of the cell the soma is where most of the feed-forward inputs go and this acts like your typical neuron that you're used to it's it's a weighted sum plus a non-linearity this these inputs tend to drive the cell and it's sort of the classic point neuron but the amazing thing is this is actually only 10% of the synapses on the cell and 90% of the synapses are in these other distal areas these blue areas and as I mentioned earlier in these areas as few as eight to 20 clustered synapses can generate can detect a pattern and they generate what's known as a dendritic spike in NMDA spike and this bike travels to the center of the cell but it does not cause the cell to fire so you get this event this recognition event that that seems to have no impact on the cell in terms of firing rate but if you look inside the cell it turns out it does actually prime the cell to fire more strongly in the future and these neurons can detect hundreds of these independent sparse patterns all throughout the dendritic tree and they're all completely independent and so for a long time it was really puzzling what is what the point of all these synapses if it doesn't have any direct impact like this and so we think is going on is that these these dendritic areas are playing different functions so you have the feed port pattern this defines the the basic pattern that the cell is recognizing and what's going on here is that the synapses in this lower distal areas are detecting sparse local patterns of activity of nearby neurons and these are acting as contextual predictions so when some of these patterns are detected is going to then prime the cell to be firing more strongly in the future and then the synapses up on the top are getting top-down inputs mostly from regions above and these are also detecting sparse patterns and they invoke top-down expectations so this is a different type slightly different kind of prediction but it's also a type of prediction so what's going on is that you have this neuron that's trying to predict its own activity in lots of different contexts okay and then if you look at the learning rules that in here that people have discovered there's really three very basic learning rules outside of this green area and the basic things here are if a cell becomes active if it fires for some reason if there was a prediction so in the past if it fires because of the green input if there happened to be a prediction in the past so some dendritic segments caused a prediction here we're going to reinforce that segment so we're going to reinforce only the synapses in that segment and not in the rest of the cell if there was no prediction but the cell still fires we're going to start growing connections to some random segments on the cell okay and these connections will subsample from the input that's that's coming in it's gonna be a sparse sampling if however the cell did not was not active if there was a prediction that means it was an incorrect prediction and we're going to slightly weaken the segment that caused that prediction okay so three very simple learning rules and here just to point out learning consists of growing new connections and each learning event is like creating a sub sample consisting of a very sparse vector each neuron can be associated with hundreds of these bars contextual patterns and essentially each neuron is constantly trying to make predictions and learn from its mistakes and notice that there's no supervision here there's no batch notion here these learning rules are constantly occurring so everything is continuously learning and it turns out that because these vectors are sparse and remember if you're in the right regime they're going to be really far apart in the space and not interfere with one another so as long as these vectors are really sparse there's no interference and you can keep learning things without corrupting previous values or other learn things so this is another benefit of these highly sparse representations so this is a very simple kind of learning scenario here so it turns out you can take you can build a network of these neurons we've done and you can get a very powerful predictive learning algorithm and I'm not going to walk you through the details of the algorithm but essentially you have groups of cells each cell is associating the sum past activity as context for its current activity just one time step in the past it learns continuously and it turns out it generally does not forget any of the past patterns because of the sparse representations these networks and actually run really complex Highmark of order sequences which means you can impact the current state based on input that happened many time steps in the past even though the learning rule is kovin it's only looking at the previous state so there's a kind of a dynamic programming aspect to it and then everything is sparse so not only can you learn continuously without forgetting but it's also extremely fault tolerant these networks and so I'll just show one simple result and this is published a couple of years ago you can this network works really well with streaming data sources so here's a case of New York City Taxi demands so this is a data set that's rude released by the New York City metropolitan Authority and you see a typical kind of weekly pattern here there's seven kind of bumps here and the basic task is to predict taxi demand in the future if you look at the error of prediction here we've tested our network which is in these HTM networks which stands for hierarchical temporal memory it's the name of our algorithm we've compared against a bunch of other techniques and the error rate of the HTM is approximately the same as the best LS TM Network okay so they're about the same error rate however what's interesting is what happens when the statistics change and the HTM networks because they're continuously learning they adapt very rapidly to changes in the statistics so this is error rate over time here's the case where the statistics of the sequences changed you can see that the error for the LST error for both HTM and lsdm goes up pretty high but then the HTM error rapidly drops back to the baseline rate whereas the LS TM takes quite a long time before it it drops back and this is true even if you keep retraining the LS TM and you can play with the the you know retraining window and all of that and it doesn't matter and that's because LS Tian's is our fundamentally batch systems and there's no notion of recency and the in the in the in the samples and it takes a long time before the change statistics are a significant percentage of the overall data set and if you don't train on enough then you there it's just hi all over the place so the kind of continuously that I that I described as perfect for adapting really quickly to changing statistics so just as a summary again here the the way that neurons operate and the way these dendritic segments operate leads to a very simple continuous unsupervised learning rule that can learn continuously without kind of forgetting previous patterns okay so I've talked about robustness and and continuous learning we have kind of a long roadmap of things to do as Jeff alluded to so within robustness we've only tried on relatively small problems here so I'd like to try it on much larger problems and it'd be really interesting to test with adversarial systems as well to see whether these sparse networks and actually hold up against many of the adversarial attacks with continuous learning that has not been integrated into a deep learning system the algorithm I showed you were just one layer systems so I think it can be integrated in and we can keep the same philosophy as B's in there as we integrate in and we can implement these predictive learning roles within deep learning systems and I think that will may help enable continuous learning and unsupervised learning in a very rich way in deep learning systems and then beyond that there's the full thousand brains idea you know Jeff talked about the voting mechanisms and voting across sensory modalities and across regions will add some really interesting robustness properties as well as other properties we want to move to a case where there's many many small models across sensory modalities that are you know hypothesizing what their sensory inputs are are detecting and then voting to resolve ambiguity there and then we I think it's critical to move to scenarios where at every layer of a deep learning system you have inherent object specific reference frames and this is going to allow much faster learning and much better generalization because you're you'll be much more invariant to changes in perspective for example if you if you can represent things in their own reference frames okay you know one of the reasons we're here is to see if there's any opportunities for collaboration and we'd love to you know discuss with any of you if you're interested in any of these ideas and see how we can apply them together here are some ideas for possible projects in the range of applications mentioned this idea of testing robustness and adversarial systems this is not something we have a lot of expertise in it would be great to work with someone who's looking into that and security the tons of security implications here as well I think we could test with different domains such as robotics natural language processing Internet of Things and as we incorporate the these things in as differentiable systems they can be applied to just about any deep learning architecture and paradigm including of course recurrent networks and reinforcement learning and and so on so it'll be great to work with people who have expertise in that and really see how far we can scale these things and then specifically in terms of larger problems you know m-miss and Google speech commands are still relatively toy problems we want to attack much larger problems as a small lab it's really hard for us to do imagenet style stuff so we'd love to work with anyone who wants to scale these networks there's a lot of really interesting things that can be done in terms of acceleration and power efficiency I didn't really talk about the power advantages of sparse representations but they're pretty dramatic in here unfortunately sparse computations are not very well suited to GPUs and they're really well suited to FPGAs and other computations so I think accelerating these things are going to be quite quite challenging and interesting okay so hopefully some of you guys are interested in these things and come come talk to us and then there's a picture of our research team and our contact info here so thank you [Applause] so I'm how do you want to handle it now do you want us chords 12 so we went over our our we're here so one of those hang out answers questions great there's actually wondering more questions yeah this is I'm Jeff learning also happens so I think the question if I understand it correctly is when we look at a flat screen even with maybe just one eye so we don't even have stereo offic vision at all how is it that we rebuild this depth model of the world and so on it's a great question we have some clues as the answers of that that it looks as this I talked about those grid cells is representing location it looks like they are also not only they driven by your internal motor commands but they can be driven by sensory clues such as flow and so the way I've used this example a lot let march and you're watching someone play a third persons first person shooting game I never play these games so but you're going through this maze and you're running around you have you're following this thing and you know where that player is and you know where they are in the map and so all that's happening the whole grid self thing and all those location things are happening even though it's there isn't this depth thing so it doesn't require it doesn't require that you have some three-dimensional camera it doesn't require stereo optic and what it requires is that there are sensory clues such as flow and and you look in the new your cortex these things are highly represented in the cortex so various types of vector flow fields that are occurring are driving the system even though there isn't a three-dimensional sensor it's an interesting question why I'm looking out at you right now why do you appear there and not on my retina right my perception of you are there even though you're actually on my retina right so it's the same problem really in some sense how how do they know you're there and not he and it's because of these sensory clues that are really extracted early arm from if you look at if you look at all the sensory streams both tactile and vision they have this sense of motion and flow built into them so you have tactile sensors which detect movement and your individuals and your retina as well so I think that's a general answer that question but maybe not really super detailed answer the trains gonna come out yeah us you know yeah it's it's a fascinating thing I think I just try to expand it to make it just realize how crazy it is that you perceive everything out of distance all the time how is that I mean in hindsight it's so obvious that everything has a location representation this was not obvious three years ago we just said I know it's out there I'm sure how but how do you know it's out there what kind of neurons are representing it anyway it's a great question and and we don't really have all the answers to it but that's the basic gist of the answer yeah so so I don't the questions about language and it's less physical how do we understand just theory in terms of that we don't really understand but I'll tell you I'll give you a couple clues here first of all language consists of words and whether they're written words whether they're spoken words was a sign language those are all physical objects that you can model so you've got columns in your auditory cortex or building auditory models of words you've got columns in your cortex that are really envision models the world and they can vote so that's why I can here a partial thing and here see something maybe watch your and I push these together so we start off with atoms of language that are really physical objects but then how do they get their conceptual nature we don't know the answer that question yet but there's some very interesting things that happen recently in neuroscience we derived this theory based on the idea that you know how it is it that I touch objects or see objects there's a whole nother set of researchers just come last couple years where they're studying humans using fMRI and they've shown that even when you're thinking about conceptual objects in your head you're imagining various things that there are there's evidence that there are grid cells underlying it so they've discovered this using very clever imaging techniques but you can sit in an fMRI machine I had this fly that came up or like it shows you about birds but so no one really understands this yet but the evidence is very very clear that this is what's going on so how do you map conceptual objects like words into this location space we don't really know yet I mentioned the issues of recursion which is a big part of what languages if you read about Chomsky and so on so all these things are triangulating saying that is what's going on we just don't understand it completely but we do know it's going to be based on location frames it's going to be based on recruiting location frames we have all this evidence which triangulating on that and that's just a fascinating thing to think about so to me this is this feels more in that lot and they to me the big hurdle we overcame was just understanding how this reference frame concept applies throughout everything and now it's more turning the crank and going down these different pieces and explaining how it is that we do all these different components and what how do we have we put together a broader theory of like concepts and language and abstract concepts we don't really know yet but it's gonna be it's all in front of all the pieces are there so you just have to put them just think about them correctly that's my thinking when you say different brain regions outside of the neocortex or I like this to be the old part - well in within the cortex it's certainly I mean we don't explicitly state that but it's essentially I kind of alluded to it earlier a column has some input and that and it's gonna model that input it's gonna model essentially modal build a sensory modal model that input that input can't be very large it has to be fairly small you can't have a very super high dimensional input to that thing so what you see is the topology in the brain is that you know some part of the retina projections are a single call on another part projection single column and so on so if you if you could build a visual system with one column and it'd be like looking through a straw and that would be a complete visual system and it would learn by moving the store around and you know it's like it would be like moving this little window around and looking at stuff and that would work it's pretty straightforward just to expand out to have a whole bunch of those working at the same time and so I don't think we've we haven't modeled that per se our focus has been on really what if the column do and with the belief that much once you understand what the column does the rest of it becomes pretty easy it was really the tricky part is figuring out columns so we just sweet we're not able to scale up to that suffering we don't even we abandon all that a while ago we just said let's focus on a column let's focus on one column and we just got to nail that one column so all of our simulations have been very small I mean smaller than hundreds of thousands and tens of thousands of neurons all would be contained in a single column and to scale up to you know brain human brain size stuff a we don't it's not theoretically important at this point in time but also we don't really have the ability to do that but I think it'd be fairly simple to do yeah we do I didn't talk about many columns for those who don't know a mini column is a structure these are physical structures you can see them in the neocortex or somewhere between 30 and 80 microns wide that's really small there's several hundred of them in a cortical column just one point to note that many columns are only visible in primates so people study rats don't see them that doesn't mean they don't exist you just can't see them so you could have the function equivalent could be there but they just don't see him the the network that tsumetai mentioned early and he said he wasn't going to talk about it which is the one that sort of learned sequences and and built on this neuron model we have in fact all the networks we talked about that I mentioned here how all this stuff works it's built on many columns and oops what was it and I can tell you briefly what we think there's a whole bunch of part of parts of this but many column is essentially one of the ways we use it is there all the cells in their mini column have the same basic feed-forward response property so you if I find a v1 nur and that's response to an edge they all this is known neuroscience all those cells have the same sort of visual response property but in in context of real-world animal moving about in observing real things it becomes very sparse and so only at any point I'm only one of those cells becomes active so the min is a way of taking an input and sparse applying it in context so you can take if I if I had no context all the cells become active and I basically say I have some set of features in contacts you say i've unique representation of that input it's the same input but it's a very very unique representation all this is in the 2016 paper all detailed in gory detail about how this works yeah and we're right in the moment I'm expanding the concept of many columns because I actually think I know I'm not certain of this yet I talked about these grid cell modules I actually I'm working on the idea that the grid cell modules might actually be one per many column they're very small and so each mini column could be representing not only a feature but it could also be representing a sort of um a part of the location space so it's it's integral to the structure none of this has to be done this way in an artificial neural network you don't have to have these physical structures you can arrange the things any way you want but in the brain it looks like we do have roles for many columns yes exactly what they are this was the clue so there's there's two ways we did that with the sequence memory remember superly talked about the taxi data the context there was the previous state of the same cells it's like where am i in this sequence this previous state tells me what to predict my next day all we did to get this whole sensorimotor inference system working is add another input which is the grid cells so the grid cells are basically saying you can look at your previous state but you can also look at my the the location and so in which one of those works better for you yeah it spatial and temporal and it learns on its own which that layer for cell we believe will actually learns on its own which are the which are the which of the proper context in which to make a prediction yeah no wet lab well okay so I talked about it briefly I'll just go through it again there's two basic ways one is verifying via empirical data and your others via simulation and so the empirical data is the first thing we do is we go through existing literature there are just incredible amounts of existing literature that nobody knows about just because we all forgot about it so we say hey we have this idea that dendrites ought to do this can we find and someone find that once in order we find falsification for it but we very often can find falsification for our theories and we go back to square one we don't accept anything which is not biologically accurate then we go then we go and talk to experimental labs and we say here's what we think what do you guys think about this sometimes they can find data that they haven't published they say yeah you know what yeah we have this data we didn't think about it that way but let's go look at it or they publish it but they didn't that's been very fruitful so we have collaborations with labs some people want to test our data with new experiments but that takes forever I typical the rat experiment could take you know two years from start to finish and so some of that's going on but we can't wait for it there's another thing I'll point out that there's something I you like to use which most people some people just think it's that you shouldn't do that but the number of constraints you're satisfying at any point in time with the theory is an indication how good it is so if there are 25 constraints like these are biological strengths we know the neurons look like this and the samson's do this and this and this and this at first it makes your theories much much harder how do i satisfy all those constraints simultaneously it's just almost impossible but when you actually get an answer which satisfy many constraints you're almost certain it's right and that is proven screw over and over and over again it's not proof but it we work on it and if you just if you loosen up your constraints now let me just give some biological inspiration you can come up with anything if you if you really want to satisfy the real biology it is really hard it is not easy to come up with solutions trust me and so it takes us a long time to come up first of all 100% agree with you I think we have to move beyond just layers of simple point neurons and it's sort of what I alluded to here in so for me the first step was really making sure we have sparsity handled and now I'm focusing on integrating the neuron model so not just point neuron but the dendritic structure in there and then a larger larger scale structures like the mini column and and a cortical column and that's going to be necessary for including object sending reference object centric reference frames and including motor input and having a true sensory motor predictive system and I think this is going to really pay a huge dividends down the road the clear we think this the architecture we talked about here is gonna be view out 20 years from now this is what people are gonna be building um how do you get there how do you move people in that direction and these are complex problems you know things implement so we're thinking through that process how do we get there how do we start when one step at a time it's engine think about what geoff hinton did recently with his capsules he was he has intuition that's some sort of location and relative location stuff had to be important but he didn't he doesn't have that sort of the the deep neuroscience knowledge that we have and so we have a much richer idea of what's going on there and he does but it's the same basic intuition that we need to go into a different sort of representational framework and that incorporates much more sophistication and needs in a column if you will right now the judicial officers on their own networks are really very simple it's almost like one neuron yeah maybe I'll point out you know convolutional neural networks were originally inspired by the biology a little bit and so they have you know you have your filters feature detectors followed by a pooling step well that corresponds in this kind of diagram to input coming into layer four going up to layer two three and then up to the next level if you look count the number of synapses in a cortical column that match that model it's less than 1% so it doesn't match 99% of what's going on in in our brain if you look at the individual connections so if all of this other complexity has to be incorporated that's basically a competition convolutional neural lab and so you have to incorporate all of this structure eventually to get we think in order to get truly intelligent system no I mean I I think well capsules is the closest that I've seen and it does incorporate some of these intuitions in there but by and large no I think most people are stuck with both on the insight three years ago the first publications were a year ago we just published the I'm the most paper about this in December I presented it to a large 700 neuroscientists in an October in Europe at the human brain it's all very new location you will find it anywhere in the neuroscience lecture other than that's awkward so we're just starting on this path and are one of the reasons we're here is to see how quickly we can get people who work with us on this and the doctor now what do you think this is a tremendous research roadmap for machine intelligence in general here there's so many rich ideas in here and and we know this is how the brain works now we have strong confidence in this and we can point to you know which of these structures you know the exact benefits is going to have down the road for for practical systems so things do have to move this way if you go to Numenta comm there's a paper section so they're all kind of listed there okay you got our emails just as your arm so yeah yeah
Info
Channel: Microsoft Research
Views: 34,447
Rating: 4.9313827 out of 5
Keywords: microsoft research
Id: 5LFo36g4Lug
Channel Id: undefined
Length: 90min 6sec (5406 seconds)
Published: Mon Mar 25 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.