Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning | Lex Fridman Podcast #258

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

the following is a conversation with john le his second time in the podcast he is the chief ai scientist at meta formerly facebook professor at nyu touring award winner one of the seminal figures in the history of machine learning and artificial intelligence and someone who is brilliant and opinionated in the best kind of way and so is always fun to talk to this is a lex friedman podcast to support it please check out our sponsors in the description and now here's my conversation with yon lacoon you co-wrote the article self-supervised learning the dark matter of intelligence great title by the way with ishan mizrah so let me ask what is self-supervised learning and why is it the dark matter of intelligence i'll start by the dark matter part uh there is obviously a kind of learning that humans and animals are are doing that we currently are not reproducing properly with machines with ai right so the most popular approaches to machine learning today are or paradigms i should say are supervised running and reinforcement learning and they are extremely inefficient supervised learning requires many samples for learning anything and reinforcement learning requires a ridiculously large number of trials and errors to for you know a system to run anything and that's why we don't have self-driving cars that's a big leap from one to the other okay so that to solve difficult problems you have to have a lot of uh human annotation for supervised learning to work and to solve those difficult problems with reinforcement learning you have to have some way to maybe simulate that problem such that you can do that large scale kind of learning that reinforcement learning requires right so how is it that you know most teenagers can learn to drive a car in about 20 hours of practice whereas uh even with millions of hours of simulated practice a self-driving car can't actually learn to drive itself properly um and so obviously we're missing something right and it's quite obvious for a lot of people that you know the immediate response you get from many people is well you know humans use their background knowledge to learn faster and they're right now how was that background knowledge acquired and that's the big question so now you have to ask you know how do babies in the first few months of life learn how the world works mostly by observation because they can hardly act in the world and they learn an enormous amount of background knowledge about the world that may be the the basis of what we call common sense this type of learning it's not learning a task it's not being reinforced for anything it's just observing the world and figuring out how it works building world models learning world models how do we do this and how do we reproduce this in in machines so cell supervision learning is you know one instance or one attempt at trying to reproduce this kind of learning okay so you're looking at just observation so not even the interacting part of a child it's just sitting there watching mom and dad walk around pick up stuff all of that that's the that's what you mean by background knowledge perhaps not even watching mom and dad just you know watching the world go by just having eyes open or having eyes closed or the very act of opening and closing eyes that the world appears and disappears all that basic information and you're saying in in order to learn to drive like the reason humans are able to learn to drive quickly some faster than others is because of the background knowledge they were able to watch cars operate in the world in the many years leading up to it the physics of basics objects all that kind of stuff that's right i mean the basic physics of objects you don't even know you don't even need to know you know how a car works right because that you can learn fairly quickly i mean the example i use very often is uh you're driving next to a cliff and you know in advance because of your you know understanding of intuitive physics that if you turn the wheel to the right the car will veer to the right we'll run off the cliff fall off the cliff and nothing good will come out of this right um but if you are a sort of you know tabularized reinforcement learning system that doesn't have a model of the world you have to repeat falling off this cliff thousands of times before you figure out it's a bad idea and then a few more thousand times before you figure out how to not do it and then a few more million times before you figure out how to not do it in every situation you ever encounter so self-supervised learning still has to have some source of truth being told to it by somebody and is so you have to figure out a way without human assistance or without significant amount of human assistance to get that truth from the world so the mystery there is um how much signal is there how much truth is there that the world gives you whether it's the human world like you watch youtube or something like that or it's the more natural world so how much signal is there so here's the trick there is way more signal in sort of a self-supervised setting than there is in either a supervised or reinforcement setting and this is going to my you know analogy of the cake the you know low cake as someone has called it where when you try to figure out how much information you ask the machine to predict and how much feedback you give the machine at every trial in reinforcement learning you give the machine a single scaler you tell the machine you did good you did bad and you you and you only tell this to the machine once in a while when i say you it could be the the universe telling the machine right but it's just one scalar so as a consequence there is you you cannot possibly learn something very complicated without many many many trials where you get many many feedbacks of this type supervision you you give a few bits to the machine at every every sample let's say you're training a system on you know recognizing images on imagenet there is 1000 categories that a little less than 10 bits of information per sample but star supervisory here is a setting you ideally we don't know how to do this yet but ideally you would show a machine a segment of a video and then stop the video and ask me ask the machine to predict what's going to happen next so you let the machine predict and then you let time go by and show the machine what actually happened and hope the machine will you know learn to do a better job at predicting next time around there's a huge amount of information you give the machine because it's an entire video clip of uh you know of the future after the video clip you fed it in the first place so both for language and for vision there's a subtle seemingly trivial construction but maybe that's representative of what is required to create intelligence which is filling the gap so in the gaps it sounds dumb but can you it's it is possible you can solve all of intelligence in this way just for both language just give a sentence and continue it or give a sentence and there's a gap in it uh some words blanked out and you fill in what words go there for vision you give a sequence of images and predict what's going to happen next or you fill in what happened in between do you think it's possible that formulation alone as a signal for self-supervised learning can solve intelligence for vision and language i think that's our best shot at the moment um so whether this will take us all the way to you know human level intelligence or something or just cat level intelligence uh it's not clear but among all the possible approaches that people have proposed i think is our best shot so i think this idea of uh an intelligent system filling in the blanks either you know predicting the future inferring the past filling in missing information uh you know i'm currently filling the blank of what is behind your head and what you what your head looks like and you know from from the back uh because i have you know basic knowledge about how humans are made and i don't know if you're gonna you know what are you gonna say at which point you're gonna speak whether you're gonna move your head this way or that way which way you're gonna look but i know you're not gonna just dematerialize and reappear three meters uh down the hall uh you know because i know what's possible and what's impossible uh according to into the physics so you have a model of what's possible what's impossible and then you'd be very surprised if it happens and then you'll have to reconstruct your model right so that that's the model of the world it's what tells you you know what fills in the blanks so given your partial information about the state of the world given by your perception uh your your model of the world fills in the missing information and that includes predicting the future retrodicting the past uh you know filling in things you don't immediately perceive and that doesn't have to be purely generic vision or visual information or generic language you can go to specifics like predicting what control decision you make when you're driving in a lane you have a sequence of images from a vehicle and then you could you have information if you recorded on video where the car ended up going so you can go back in time and predict what the car went based on the visual information that's very specific domain specific right but the question is whether we can come up with sort of a generic uh method for you know training machines to do this kind of prediction or filling in the blanks so right now uh this type of approach has been unbelievably successful in the context of natural language processing uh every modern natural language processing is pre-trained in self-supervised manner to fill in the blanks to you you show it a sequence of words you remove 10 percent of them and then you train some gigantic neural net to predict the words that are missing that and once you've pre-trained that network you can use the internal representation learn by it as input to you know something that you train supervised or whatever that's been incredibly successful not so successful in images although it's making progress and uh and it's based on uh sort of manual data augmentation uh we can go into this later but what has not been successful yet is training for video so getting a machine to learn to represent the visual world for example by just watching video nobody has really succeeded in doing this okay well let's kind of give a high level overview what's the difference in kind and in difficulty between vision and language so you said people haven't been able to really kind of crack the problem of vision open in terms of self-supervised learning but that may not be necessarily because it's fundamentally more difficult maybe like when we're talking about achieving like passing the turing test and the full spirit of the turing test in language might be harder than vision that's that's not obvious so what in your view which is harder or perhaps are they just the same problem when uh the farther we get to solving each the more we realize it's all the same thing it's all the same cake i think what i'm looking for are methods that make make them look essentially like the same cake but currently they're not and the main issue with learning water models or learning predictive models is that the prediction is never a single thing because the world is not entirely predictable it may be deterministic or stochastic we can get into the philosophical discussion about it but uh but even if it's deterministic it's not entirely predictable and so if i play a short video clip and then i ask you to predict what's going to happen next there's many many plausible continuations for that video clip and the number of continuation grows with the interval of time that you're asking the system to make a prediction uh for and so one big question with supervision is how you represent this uncertainty how you represent multiple discrete outcomes how you represent a sort of continuum of possible outcomes etc and you know if you are a sort of a classical machine learning person you say oh you just represent a distribution right and that we know how to do when we're predicting words missing words in the text because uh you can have a neural net give a score for every word in a dictionary it's a big you know it's a big list of numbers you know maybe a hundred thousand or so and you can turn them into a probability distribution that gives that tells you when i say a sentence you know the you know the cat is chasing the blank in the kitchen you know there are only a few words that make sense there you know it could be a mouse or it could be a laser spot or you know something like that right uh and and if i if i say the the blank is changing the blank in the savannah you also have a bunch of plausible options for those two words right um that that because you you have kind of a you know underlying reality that you can refer to to sort of fill in those those blanks um so you cannot say for sure in the savannah if it's a you know a lion or cheetah or whatever you cannot know if it's a zebra or glue or you know whatever wildebeest the same thing um but uh but you can represent the uncertainty by just a long list of numbers now if i uh if i do the same thing with video and i ask you to predict a video clip it's not a discrete set of potential frames you have to have somewhere representing a sort of infinite number of plausible continuations of multiple frames in a you know high dimensional continuous space and we just have no idea how to do this properly uh finite high dimensional so like you because it's fine dimensional yes just like the words i try to get it to uh down to a small finite set of like under a million something like that something like that i mean it's kind of ridiculous that we're doing a distribution over every single possible word for language and it works it feels like that's a really dumb way to do it um like there seems to there seems to be like there should be some more compressed representation of the distribution of the words you're right about that and so i agree do you have any interesting ideas about how to represent all the reality in a compressed way such you can form a distribution over it that's one of the big questions you know how do you do that right i mean what's what's kind of you know another thing that that really is stupid about um i shouldn't say stupid but like simplistic about current approaches to cell supervision in in uh nlp in text is that not only do you represent a giant distribution over words but for multiple words that are missing those distributions are essentially independent of each other and you know you don't pay too much of a price for this so you you so you can't so the you know the system you know in the the the sentence that i gave earlier if he gives a certain probability for a lion and and uh cheetah and then a certain probability for uh you know gazelle uh wildebeest and and and zebra uh those two probabilities are independent of each other uh and it's not the case that those things are independent lions actually attack like bigger animals than than she does so you know there's a huge independence hypothesis in in this process which is not actually true the reason for this is that we don't know how to represent uh properly distributions over combinatorial uh sequences of symbols essentially whenever because the number goes exponentially with the length of the of the symbols and so we have to use tricks for this but um those techniques can you know get around like don't even deal with it so so the big question is like would there be some sort of abstract latent representation of text that would say that you know when i when i switch lion for gazelle lion for cheetah i also have to switch zebra for gazelle yeah so this independence assumption let me throw some criticism at you that i often hear and see how you respond so this kind of filling in the blanks is just statistics you're not learning anything like the deep underlying concepts you're just mimicking stuff from the past you're not learning anything new such that you can use it to generalize about the world or okay let me just say the crude version which is it's just statistics it's not intelligence uh what do you have to say to that what do you usually say to that if you kind of hear this kind of thing i don't get into those discussions because they are they're kind of pointless um so first of all it's quite possible that intelligence is just statistics it's just statistics of a particular kind yes uh where this is the philosophical question it's kind of is is intel is it possible that intelligence is just statistics yeah but what kind of statistics so uh if you are asking the question are the model of the world the models of the world that we learn um do they have some notion of causality yes so if the criticism comes from people who say you know current machine learning system don't care about causality which by the way is wrong uh you know i agree with them yeah you should you know your model of the world should have your actions as one of your of the inputs and that will drive you to learn causal models of the world where you know what you know what uh intervention in the world will cause what results or you can do this by observation of other agents uh acting in the world and and observing the effect uh other humans for example so i think you know at some level of description uh intelligence is just statistics uh but that doesn't mean you don't you don't you know you won't have models that have you know deep mechanistic explanation for what goes on uh the question is how do you learn them that's that's the question i'm interested in uh because you know a lot of people who actually voice their criticism say that those mechanistic model has to have to come from someplace else they have to come from human designers they have to come from i don't know what and obviously we learn them or if we don't learn them as an individual nature learn them for us using evolution so regardless of what you think those processes have been learned somehow so if you look at the the human brain just like when we humans introspect about how the brain works it seems like when we think about what is intelligence we think about the high level stuff like the models we've constructed concepts like cognitive science like concepts of memory and reasoning module almost like these high-level modules is there's is this service a good analogy like are we ignoring the uh the dark matter the the basic low-level mechanisms just like we ignore the way the operating system works we're just using the uh the the high-level software we're ignoring that at the low level the neural network might be doing something like statistics like me sorry to use this word probably incorrectly and crudely but doing this kind of fill in the gap kind of learning and just kind of updating the model constantly in order to be able to support the raw sensory information to predict it and then adjust to the prediction when it's wrong but like kyla when we look at our brain at the high level it feels like we're doing like we're playing chess like we're we're like playing with high level concepts and we're stitching them together and we're putting them into long-term memory but really what's going underneath is something we're not able to introspect which is this kind of uh simple large neural network that's just filling in the gaps right well okay so there's a lot of questions there are answers there okay so first of all there's a whole school of thought in neuroscience computational neuroscience in particular that likes the idea of predictive coding which is really related to the idea i was talking about in self-supervised learning so everything is about prediction the essence of intelligence is the ability to predict and everything the brain does is trying to predict uh predict everything from everything else okay and that's really sort of the underlying principle if you want that uh cell supervisor learning is trying to kind of reproduce this idea of prediction that's kind of an essential mechanism of task independent learning if you want the next step is what kind of intelligence are you interested in reproducing and of course you know we all think about you know trying to reproduce sort of you know high-level cognitive processes in humans but like with machines we're not even at the level of even reproducing the learning processes in a in a cat brain um you know the most intelligent or intelligent systems don't don't have as much common sense as as a house cat so um how is it that cats learn and you know cats don't do a whole lot of uh reasoning they certainly have causal models they certainly have uh because you know many cats can figure out like how they can act on the world to get what they want um they certainly have uh a fantastic model of intuitive physics uh certainly of the the the dynamics of their own bodies but but also of praise and things like that right so um they they're they're pretty smart they only do this with about 800 million neurons we are not anywhere close to reproducing this kind of thing so to some extent i could i could say let's not even worry about like the high level cognition and kind of you know long-term planning and reasoning that humans can do until we figure out like you know can we even reproduce what cats are doing now that said this ability to learn world models i think is the key to the possibility of learning machines that can also reason so whenever i give a talk i'd say there are there are three challenges in the three main challenges in machine learning the first one is uh you know getting machines to learn to represent the world and proposing salt supervised running the second is getting machines to reason in ways that are compatible with essentially gradient-based learning because this is what deep learning is all about really and the third one is something we have no idea how to solve at least i have no idea to solve is uh can we get machines to learn hierarchical representations of action plans you know like you know we know how to train them to learn hierarchical representations of perception you know with computational nets and things like that and transformers but what about action plans can we uh get them to spontaneously learn good hierarchical representations of actions also gradient based yeah all of that you know needs to be somewhat differentiable so that you can apply sort of gradient-based learning uh which is really what deep learning is about so it's background knowledge ability to reason in a way this differentiable that is somehow connected deeply integrated with that background knowledge or builds on top of that background knowledge and then given that background knowledge be able to make hierarchical plans right in the world so if if you take classical optimal control there's something in classical optimal control called uh model predictive control and it's you know it's been around since the early 60s nasa uses that to compute trajectories of rockets and the basic idea is that you have a pretty predictive model of the rocket let's say or whatever system you are you intend to control which given the state of the system at time t and given an action that you're taking the system so for rocket to be thrust and you know all the controls you can have uh it gives you the state of the system at time t plus delta t right so basically a differential equation something like that um and if you have this model and you have this model in the form of some sort of neural net or some sort of uh set of formula that you can back propagate gradient through you can do what's called model predictive control or gradient based uh model predictive control so you have uh you can unroll that that model in time you you you you feel it a hypothesized sequence of actions and then you have some objective function that measures how well at the end of the trajectory the system has succeeded or matched what you wanted to do um you know is it a robot harm have you grasped the object you want to grasp if it's a rocket you know are you at the right place near the space station things like that and by back propagation through time and again this was invented in the 1960s by optimal control theorists you can figure out uh what is the optimal sequence of actions that will you know get my system to the the best final state so that's a form of reasoning it's basically planning and a lot of planning uh systems in robotics are actually based on this and uh and you can think of this as a form of reasoning so you know to take the example of the teenager driving a car again you have a pretty good dynamical model of the car it doesn't need to be very accurate but you know again that if you turn the wheel to the right and there is a cliff you're gonna run off the cliff right you don't need to have a very accurate model to predict that and you can run this in your mind and decide not to do it for that reason because you can predict in advance that the result is going to be bad so you can sort of imagine different scenarios and and then you know employ uh or take the first step in the scenario that is most favorable and then repeat the process of planning that's called receding horizon model predictive control so even you know all those things have names you know uh going back you know decades um and so if you're not not uh you know classical optimal control the model of the world is not generally learned uh there's you know sometimes a few parameters you have to identify that's called systems identification but uh but generally the model is mostly deterministic and mostly built by hand so the big question of ai i think the big challenge of ai for the next decade is how do we get machines to learn predictive models of the world that deal with uncertainty and deal with the real world in all this complexity so it's not just the trajectory of a rocket which you can reduce to first principles it's not it's not even just a trajectory of a robot arm which again you can model by you know careful mathematics but it's everything else everything you observe in the world you know people behavior um you know physical systems that involve collective phenomena like water or or you know trees and you know branches in a tree or something or or like complex things that you know humans have no trouble developing abstract representations in predictive model for but we still don't know how to do with machines where do you put in in these three maybe in the in the planning stages the game theoretic nature of this world where your actions not only respond to the dynamic nature of the world the environment but also affected so if there's other humans involved is this is this point number four or is it somehow integrated into the hierarchical representation of action in your view i think it's integrated it's just um it's just that now your model of the world has to deal with you know it just makes it more complicated right the fact that uh humans are complicated and not easily predictable that makes your model of the world much more complicated that much more complicated well there's a chat i mean i suppose chess is an analogy so monte carlo tree search there's a i go you go i go you go like um andre kapatha recently gave a talk at mit about car doors i think there's some machine learning too but mostly car doors and there's a dynamic nature to the cart like the person opening the door checking and he wasn't talking about that he was talking about the perception problem of what the ontology of what defines a car door this big philosophical question but to me it was interesting because like it's obvious that the person opening the car doors they're trying to get out like here in new york trying to get out of the car you slowing down is going to signal something you speeding up is going to signal something and that's a dance it's a asynchronous chess game i don't know so i it feels like um it's not just i mean i guess you can integrate all of them into one giant model like the entirety of the the these little interactions because it's not as complicated as chess it's just like a little dance we do like a little dance together and then we figure it out well in some ways it's way more complicated than chess because uh because it's continuous it's uncertain in a continuous manner uh it doesn't feel more complicated but it doesn't feel more complicated because that's what we are we've evolved to solve this is the kind of problem we've evolved to solve and so we're good at it because you know nature has made us good at it nature has not made us good at chess we completely suck at chess yeah um in fact that's why we designed it as a game is to be challenging and if there is something that you know recent progress in chess and go has made us realize is that humans are really terrible at those things like really bad you know there was a story right before alphago that uh uh you know the best go players thought there were maybe two or three stones behind you know an ideal player that they would call god uh in fact no they are like nine or ten stones behind i mean we're just bad so we're not good at and it's because we have limited uh working memory we we're not very good at like doing this uh tree exploration that you know computers are much better at doing than we are but we are much better at learning differentiable models of the world i mean i said differentiable in the kind of you know i should say not differentiable in the sense that you know we went back for up to it but in the sense that our brain has some mechanism for estimating gradients uh of some kind yeah and that's what you know makes us uh efficient so if you have an agent that consists of a a model of the world which you know in the human brain is basically the entire front half of your brain an objective function which uh in human in in humans is a combination of two things there is your sort of intrinsic motivation module which is in the basal ganglia you know at the base of your brain that's the thing that measures pain and hunger and things like that like immediate feelings and emotions and then there is you know the equivalent of what people in reform spectrum called a critic which is a sort of module that predicts ahead what the outcome of a uh of a situation will be and so it's it's not a cost function but it's sort of not an objective function but it's sort of a you know trained predictor of the ultimate objective function and that also is differentiable and so if all of this is differentiable your cost function your your critic your uh you know your your role model then you can use gradient-based type methods to do planning to the reasoning to do learning uh you know to do all the things that would like an intelligent agent uh to do and the gradient-based learning like what's your intuition that's probably at the core of what can solve intelligence so you don't need like a logic based reasoning uh in your view i don't know how to make logic based reasoning compatible with efficient learning yeah and okay i mean there is a big question perhaps a philosophical question i mean it's not that philosophical but uh that we can ask is is that you know all the learning algorithms we know from engineering and computer science proceed by optimizing some objective function yeah right so one question we may ask is is does learning in the brain minimize an objective function it could be a you know a composite of multiple objective functions but it's still an objective function uh second if it does optimize an objective function does it do does it do it by some sort of gradient estimation you know it doesn't need to be back prop but you know some way of estimating the gradient in efficient manner whose complexity is on the same order of magnitude as you know actually running the inference because you can't afford to do things like you know perturbing a weight in your brain to figure out what the effect is and then sort of uh you know you can do sort of estimating gradient by perturbation it's it to me it seems very imp implausible that the brain uses some sort of you know zeroth order black box gradient free optimization because it's so much less efficient than gradient optimization so it has to have a way of estimating gradients is it possible that some kind of logic based reasoning emerges in pockets as a useful like you said if the brain is an objective function maybe it's a mechanism for creating objective functions it's it's a mechanism for creating knowledge bases for example that can then be queried like maybe it's like an efficient representation of knowledge that's learned in a gradient-based way or something like that well so i think there is a lot of different types of intelligence so first of all i think the type of logical reasoning that we think about that we are you know maybe stemming from you know sort of classical ai of the 1970s and 80s i think humans use that relatively rarely and are not particularly good at it but we judge each other based on our ability to uh solve those rare problems it's called an iq test i think so like i'm i'm not very good at chess yes i'm judging you this whole time because well we we actually with your with your uh you know heritage i'm sure you're good at chess no stereotypes not all stereotypes are true well i'm terrible at chess so um you know but i think perhaps uh another type of intelligence that i have is this uh uh you know ability of sort of building models of the world from uh you know reasoning obvious obviously but also also data and those those models generally are more kind of analogical right so it's it's it's reasoning by simulation and by analogy where you use one model to apply to a new situation even though you've never seen that situation you can sort of connect it to a situation you've encountered before uh and and your reasoning is more you know akin to some sort of internal simulation so you you're kind of stimulating what's happening when you're building i don't know a box out of wood or something right you can imagine in advance like what would be the result of you know cutting the wood in this particular way are you going to use you know screws on nails or whatever when you are interacting with someone you also have a model of that person and and sort of interact with that person you know having this model in mind uh to kind of uh tell the person what you think is useful to them so i think this this ability to construct most of the world is basically the essence the essence of intelligence and the ability to use it then to plan uh actions that will uh fulfill a particular criterion of course is is necessary as well so i'm going to ask you a series of impossible questions as we keep asking is that been doing so so if that's the fundamental sort of dark matter of intelligence this ability to form a background model what's your intuition about how much knowledge is required you know you know i think dark matter you put a percentage on it of uh the composition of the universe and how much of it is dark matter how much of his dark energy how much information do you think is required to to be a house cat so you have to be able to uh when you see a box going it when you see a human compute the most evil action if there's a thing that's near an edge you knock it off all of that plus the extra stuff you mentioned which is a great self-awareness of the physics of your of your own body and in the world how much knowledge is required do you think to solve it um i don't even know how to measure an answer to that question i'm not sure how to measure it but whatever it is it fits in about about 800 000 neurons uh 800 million neurons or the representation does everything all knowledge everything right um it was less than a billion a dog is two billion but a cat is less than one billion and uh so multiply that by a thousand and you get the number of synapses and i think almost all of it is is learned through this you know a sort of supervised running although you know i think a tiny flavor is learned through reinforcement running and certainly very little through you know classical supervised running although it's not even clear how supervised learning actually works in uh in a biological world um so i think almost all of it is uh is self supervision but it's driven by uh the the sort of ingrained objective functions that a cat or human have at the base of their brain which kind of drives their um their behavior so you know nature tells us uh you're hungry it doesn't tell us how to feed ourselves that's that's something that the rest of our brain has to figure out right well it's interesting because there might be more like deeper objective functions underlying the whole thing so hunger may be some kind of now you go to like neurobiology it might be just the brain uh trying to maintain homeostasis so hunger is just one of the human perceivable symptoms of the brain being unhappy with the way things are currently right it could be just like one really dumb objective function at the core but that's how that's how behavior is is driven uh the the fact that you know the orbital ganglia uh drive us to do things that are that are different from saying a wong tong or certainly a cat is what makes you know human nature versus orangutan nature versus scat nature so for example uh you know our basal ganglia drives us to seek the company of other humans and that's because nature has figured out that we need to be social animals for our species to survive and it's true of many primates it's not true orangutons orangutans are solitary animals um they don't seek the company of others in fact they avoid them in fact they scream at them when they come too close because they're territorial because for for their survival you know uh evolution has figured out that's the best thing i mean they're occasionally social of course for you know reproduction and stuff like that but um but but they're mostly solitary so so all of those behaviors are not part of intelligence you know people say oh you're never going to have intelligent machines because you know human intelligence is social but then you look at orangutans you look at octopus octopus never know their parents they barely interact with any other and and they get to be really smart in less than less than a year in like half a year you know in a year they're adults in two years they're dead so there are things that we think as humans are intimately linked with intelligence like social interaction like language we think i think we give way too much importance to language as a substrate of intelligence as humans because we think our reasoning is so linked with language so for to solve the house cat intelligence problem you think you could do it on a desert island you could have pretty much you could just have a cat sitting there um looking at the waves that the ocean weighs and figure a lot of it out it needs to have sort of you know the right set of drives uh to kind of you know get it to do the thing and learn the appropriate things right but uh like for example you know baby humans are driven to learn to stand up and walk okay you know it's not that's kind of this desire is hard-wired how to do it precisely is not that's learned but the desire to to walk move around and stand up that's sort of probably hardwired it's very simple to hardwire this kind of stuff oh like the desire to well that's interesting you're hardwired to want to walk that's not a there's got to be a deeper need for walking i think it was probably socially imposed by society that you need to walk all the other bipedal like a lot of simple animals that you know would probably work without ever watching any other members of the species it seems like a scary thing to have to do because you suck it by peter walking at first it seems crawling is much safer much more like why are you in a hurry well because because you have this thing that drives you to do it you know um which is sort of part of uh the sort of human development is that understood actually what not entirely no what is what's the reason to get on two feet it's really hard like most animals don't get on two feet well they get on four feet you know many mammals get on four feet yeah they very quickly some of them extremely quickly but i don't you know like from the last time i've interacted with the table that's much more stable than the thing then two legs it's just a really hard problem yeah how many birds have figured it out with two feet well technically we can go into ontology they have four i guess they have two feet they have two feet chickens you know dinosaurs had two feet many of them allegedly i'm just now learning that t-rex was eating grass not other animals t-rex might have been a friendly friendly pet what do you think about uh i don't know if you looked at the test for general intelligence that francois chile put together i don't know if you got a chance to look at that kind of thing like what's your intuition about how to solve like an iq type of test i don't know i think it's so outside of my radar screen that it's not really relevant i think in the short term well i guess one way to ask another way perhaps more closer to what to your work is like how do you solve mnist uh with very little example data that's right and that's the answer to this probably is supervised running just learn to represent images and then learning uh you know to recognize handwritten digits on top of this will only require a few samples and we observe this in humans right you you show a young child a picture book with a couple pictures of an elephant and that's it the child knows what an elephant is and we we see this today with practical systems that we you know we train image recognition systems with uh enormous amounts of of images either either completely self-supervised or very weakly supervised for example you can train a neural net to predict uh whatever hashtag people type on instagram right then you can do this with billions of images because there's billions per day that are showing up so the amount of training data there is essentially unlimited and then you take the output representation you know a couple layers down from the output of what the system learned and feed this as input to a classifier for any object in the world that you want and it works pretty well so that's transfer learning okay or weekly supervised transfer learning uh people are making very very fast progress using self-supervised running uh for for with this kind of scenario as well um and you know my guess is that that's that's gonna be the future for self-supervised learning how much cleaning do you think is needed for filtering um uh malicious signal or what's a better term but like a lot of people use hashtags on instagram to uh get like good seo that doesn't fully represent the contents of the image like they'll put a picture of a cat and hashtag it would like science awesome fun i don't know all kind of why would you put science that's not very good seo the way the way my colleagues who worked on this project at uh at facebook now meta meta a few years ago uh dealt with this is that they only selected something like 17 000 tags that correspond to kind of physical things or or situations like you know that has some visual content um so you know you wouldn't have like tbt or anything like that also they keep a very select set of hashtags is what you're saying yeah okay but it's still instead on the order of uh you know 10 to 20 000 so it's fairly large okay can you uh tell me about data augmentation what the heck is data augmentation and how is it used maybe contrast of learning for uh for video what are some cool ideas here right so data augmentation i mean first data augmentation you know is the idea of artificially increasing the size of your training set by distorting the images that you have in ways that don't change the nature of the image right so you take you you're doing this you can do data augmentation on any list and people have done this since the 1990s right you take a in this digit and you shift it a little bit or you change the size or rotate it skew it you know etc add noise add noise etc and it it works better if you train a supervised classifier with augmented data you're going to get better results now it's become really interesting over the last couple years because a lot of supervised learning techniques to pre-train vision systems are based on data augmentation and the the basic techniques is originally inspired by uh techniques that i worked on in the early 90s and jeff intern worked on also in the early 90s there was sort of parallel work i used to call this siamese network so basically you take two identical copies of the same network they share the same weights and you show two different views of the same object either those two different views may have been obtained by data augmentation or maybe it's two different views of the same scene from a camera that you moved or at different times or something like that right or two pictures of the same person things like that and then you train this neural net those two identical copies of this neural net to produce an output representation a vector in such a way that the representation for those two images are as close to each other as possible as identical to each other as possible right because you want the system to basically learn a function that will that will be invariant that will not change whose output will not change when you transform those inputs uh in in those in those particular ways right so that's easy to do what's complicated is how do you make sure that when you show two images that are different the system will produce different things because if you don't have a specific provision for this the system will just ignore the input when you train it it will end up ignoring the input and just produce a constant vector that is the same for every input right yes that's called a collapse now how do you avoid collapse so there's two ideas one idea that i proposed in the early 90s with my colleagues at bell labs jane bromley and a couple other people which we now call contrastive learning which is to have negative examples right so you have pairs of images that you know are different and you show them to the network and uh those two copies and then you you push the two output vectors away from each other and they will eventually guarantee that things that are semantically similar produce similar representations and things that are different produce different representations we actually came up with this idea for a project of doing signature verification so we would collect signature signatures from like multiple signatures on the same person and then train a neural net to produce the same representation and then uh you know force the system to produce different representations for different signatures this was actually the the problem was proposed by people from uh what was a subsidiary of atnt at the time called ncr and they were interested in storing a representation of the signature on the 80 bytes of the magnetic strip of a credit card so we came up with this idea of having a neural net with 80 outputs you know that we would quantize on bytes so so that we could encode the and that encoding was then used to compare whether the signature matches or not that's right so then you would you know sign you would run through the neural net and then you would compare the output vector to whatever is stored on your card it actually worked it worked but they ended up not using it because nobody cares actually i mean the american you know financial payment system is incredibly lags in that respect compared to europe oh with the signatures what's the purpose of signatures anyway this is very nobody looks at them nobody cares yeah it's uh yeah yeah no so so that that's contrastive learning right so you need positive and negative pairs and the problem with that is that you know even though i at the original paper on this i'm actually not very positive about it because it doesn't work in high dimension if your presentation is high dimensional there's just too many ways for two things to be different and and so you would need lots and lots and lots of negative pairs so there is a particular implementation of this which is relatively recent from actually the google toronto group uh where you know jeff intern is the senior member there it's called sim clear sim clr and you know basically a particular way of implementing this idea of contracting running the particular objective function now what i'm much more enthusiastic about these days is non-contrasting methods so other ways to guarantee that uh the representations would be different for different different inputs and it's actually based on an idea that jeff intern proposed in the early 90s with a student at the time sue becker and it's based on the idea of maximizing the mutual information between the outputs of the two systems you only show positive pairs you only show pairs of images that you know are somewhat similar and you train the two networks to be informative but also to be as informative of each other as possible so basically one representation has to be predictable from the other essentially uh and you know he proposed that idea had you know a couple papers in the early 90s and then nothing was done about it for decades and i kind of revived this idea together with my postdocs at fair uh particularly a postdoc called stefanoni who's now a junior professor in in finland at university of alto we came up with something called that we call barlow twins and it's a particular way of maximizing the information content of a vector you know using some hypotheses and we have kind of a another version of it that's more recent now called vikreg vic reg that that means variance in variance covariance regularization and i'm it's the thing i'm the most excited about in machine learning in the last 15 years i mean i'm i'm not i'm really really excited about this what uh kind of data augmentation is useful for that non-contrasting learning method are we talking about does that not matter that much or it seems like a very important part of the step yeah how you generate the images that are similar but sufficiently different yeah that's right it's an important step and it's also an annoying step because you need to have that knowledge of what the augmentation you can do that do not change the nature of the of the object and so the standard scenario which you know a lot of people working in this area are using is you use uh the the type of distortion so so basically you do geometric distortion so one basically just shifts the image a little bit it's called crabbing another one kind of changes the scale a little bit another one kind of rotates it another one changes the colors you know you can do a shift in color balance or something like that uh saturation another one sort of blurs it another one adds noise so you have like a catalog of kind of standard things and people try to use the same ones for different algorithms so that they can compare but some algorithms uh some cell supervisor algorithm actually can deal with much bigger like more aggressive data augmentation and some don't so that kind of makes the whole thing difficult but but that's the kind of distortions we're talking about and and so you you you you train with those distortions and then uh you you chop off the last layer a couple layers of the of the network and you use the representation as input to a classifier you train the classifier um on imagenet let's say or whatever and measure the performance and interestingly enough the methods that are really good at eliminating the information that is irrelevant which is the distortions between those images do a good job at eliminating it and uh as a consequence you cannot use those the representations in those systems for things like object detection and localization because that information is gone so the type of data augmentation you need to do depends on the task you want eventually the system to to solve and the type of data augmentation standard data determination that we use today are only appropriate for object recognition or image classification they're not appropriate for things like can you help me out understand what uh why the localization is so you're saying it's just not good at the negative uh like classifying the negative so that's why it can't be used for the localization no it's just that you train the system you know you you you give it an image and then you give it the same image shifted and scaled and you tell it that's the same image so the system basically is trained to eliminate the information about position and size so now and now you want to use that oh yeah like figure out where an object is and what size is like a bounding box like to be able to actually okay it can still find it can still find the object in the image it's just not very good at finding the exact boundaries of that object interesting interesting which you know that's an interesting sort of philosophical question how important how important is object localization anyway we're like obsessed by measuring like image segmentation obsessed by measuring perfectly knowing the boundaries of objects when arguably that's not that essential to understanding what are the contents of the scene on the other hand i think evolutionarily the first vision systems in animals were basically all about localization very little about recognition and in the human brain you have two separate pathways for recognizing the nature of a scene an object and localizing objects so you use the first pathway called a ventral pathway for you know telling what you're looking at the other path for the dorsal pathway is used for navigation for grasping for everything else and you know basically a lot of the things you need for survival are localization and detection is similarity learning or contrast of learning are these non-contrastive methods the same as understanding something just because you know a distorted cat is the same as a non-distorted cat does that mean you understand what it means to be a cat to some extent i mean it's a superficial understanding obviously but like what is the ceiling of this method do you think is this just one trick on the path to doing cell supervised learning can we go yeah really really far i think we can go really far so if we figure out how to uh use techniques of that type perhaps very different but you know the signature to train a system from from video to do video prediction essentially i think we'll have a path um you know towards uh you know i wouldn't say unlimited but but a path towards some level of uh you know physical common sense in machines and i also think that um that ability to learn how the world works from a sort of high throughput channel like like vision is a necessary step towards uh sort of real artificial intelligence in other words i believe in ground intelligence i don't think we can train a machine to be intelligent purely from text because i think the amount of information about the world that's contained in text is tiny compared to what we need to know so for example let's uh and you know people have attempted to do this for for 30 years right the psych project and things like that right of basically kind of writing down all the facts that are known and hoping that some some sort of common sense will emerge um i think it's basically hopeless but let me take an example you take an object i i describe the situation to you i take an object i put it on the table and i push the table it's completely obvious to you that the object will be pushed with the table right because it's sitting on it there's no text in the world i believe that explains this and so if you train a machine as powerful as it could be you know your gpt 5000 or whatever it is it's never going to learn about this um that information is just not is not present in any text well the question like with the psyc project the dream i think is to have like like 10 million say facts like that that give you a head start like a parent guiding you now we humans don't need a parent to tell us that the table will move uh sorry the smartphone will move with the table but we get a lot of guidance in other ways so it's possible that we can give it a quick shortcut what about cat the guy knows that no but they evolved so no they learn like us the the sorry the physics of stuff well yeah so you're saying it's uh see you're putting a lot of intelligence onto the nurture side not the nature yes we seem to have um you know there's a very inefficient arguably process of evolution that got us from bacteria to who we are today started at the bottom now we're here so true uh the question is how okay so the question is how fundamental is that the the nature of the whole hardware and then is there any way to shortcut it if it's fundamental if it's not if it's most of intelligence most of the cool stuff we've been talking about is mostly nurture mostly trained we figure it out by observing the world we can form that uh big beautiful sexy background model that you're talking about just by sitting there then okay then you need to then like maybe uh it is all supervised learning all the way down so surprise learning site whatever it is that makes uh you know human intelligence different from other animals which you know a lot of people think is language and logical reasoning and this kind of stuff it cannot be that complicated because it only popped up in the last million years yeah and you know it it and it only involves you know less than one percent of our genome might be which is the difference between human genome and gyms or whatever so uh it can be that complicated you know it can be that fundamental i mean the most of the so complicated stuff already exist in cats and dogs and you know certainly primates non-human primates yeah that little thing with humans might be just uh something about social interaction and ability to maintain ideas across like a collective of people it's it sounds very dramatic and very impressive but it probably isn't mechanistically speaking it is but we're not there yet like you know we we have i mean this is number 634 you know in the list of problems to solve so basic physics of the world is is number one what do you um just a quick tangent on data augmentation so a lot of it is hard-coded versus learned do you have any intuition that maybe there could be some weird data augmentation like generative type of data augmentation like doing something weird to images which then improves the the similarity learning process so not just kind of dumb simple distortions but by you shaking your head just saying that even simple distortions are enough i think no i think that augmentation is a temporary necessary evil so what people are working on now is is two things one is uh the type of self-supervisioning like trying to to translate the type of cell suppressant people using language translating these two images which is basically denoting autoencoder method right so you you take an image you you block you mask some parts of it and then you you train some giant neural net to reconstruct the parts that you've that are that are missing and until very recently there was no there was no working methods for that uh all the autoencoder type methods for images weren't producing very good representation but there's a paper now coming out of the fair group in menlo park that actually works very well so that doesn't require the documentation that requires only masking okay only masking for images uh okay right so you mask part of the image and you train a system which you know in this case is a transformer because you can you can the transformer represents the image as uh non-overlapping patches so it's easy to mask patches and things like that okay then my question transfers to that problem the masking like why should the mask be a square or rectangle so it doesn't matter like you know i think we're gonna come up probably in the future with sort of uh you know ways to mask that are is you know kind of random essentially well i mean they are random already but no no but like something that's challenging like optimally challenging so like i mean maybe it's a metaphor that doesn't apply but you're it seems like there's an data augmentation or masking there's an interactive element with it like you're almost like playing with an image and like it's like the way we play with an image in our minds no but it's like dropout it's like boston machine training you um you know every every every time you see a percept you also you you can you can perturb it in some way and then uh the the principle of the training procedure is to minimize the difference of the output of the representation between the the clean version and the corrupted version essentially right and you you can do this in real time right so you know what's the machine work like this right you you you show a percept and you tell the machine that's a good combination of activities or your input neurons uh and then you either uh let them go their merry way without clamping them to values or you only do this with a subset yeah and what you're doing is you're training the system so that the the stable state of the entire network is the same regardless of whether it sees the entire input or whether it is only part of it um you know the nozzling autoencoder method is basically the same thing right you you're you're training a system to reproduce the input the complete inputs and filling the blanks regardless of which which parts are missing and that's really the underlying principle and you could imagine sort of a even in the brain some sort of neural principle where you know neurons gonna oscillate right so they they take their activity and then temporarily they kind of shut off to you know force the rest of the system to basically reconstruct the input without their help you know and and and i mean you could imagine you know you know more or less biologically possible processes and i guess with this uh denoising auto encoder and masking and data augmentation you don't have to worry about being super efficient you can just do as much as you want yeah and get better over time because i was thinking like you might want to be clever about the way you do all these procedures you know but that's only if it's somehow costly to do every iteration but it's not really not really maybe and then there is you know data augmentation without explicit data augmentation is data augmentation by waiting which is you know the the sort of video prediction you're observing a video clip observing the you know the continuation of that video clip and try you try to learn a representation using those joint embedding architectures in such a way that the representation of the future clip is easily predictable from the representation of the of the observed clip do you think youtube has enough raw data from which to learn how to be a cat i think so so the the amount of data is not the constraint no it would require some selection i think some some selection of you know maybe the right type of data you know down the rabbit hole of just cat videos that might you might need to watch some lectures or something no you wouldn't how meta would that be if it like watches lectures about intelligence and then learns watches your lectures and nyu and learns from that how to be intelligent uh what's your uh do you find multi-modal learning interesting we've been talking about visual language like combining those together maybe audio all those kinds of things there's a lot of things that i find interesting in the short term but are not addressing the important problem that i think are really kind of the big challenges so i think you know things like multitask learning continual learning uh you know adversarial issues i mean those have you know great practical interests in the relatively short term uh possibly but i don't think they're fundamental you know active learning even to some extent reinforcement learning i think those things will become either obsolete or or useless or easy once we figure out how to do self-improvised representation learning or or learning predictable models and so i think that's what you know the entire community should be focusing on uh at least people are interested in sort of fundamental questions or you know really kind of pushing the envelope of ai towards the next the next stage but of course there's like a huge amount of you know very interesting work to do in sort of practical questions that have you know short-term impact well you know it's it's difficult to talk about the temporal scale because all of human civilization will eventually be destroyed because the the the sun will die out and even if elon musk is successful multi-planetary colonization across the galaxy uh eventually the entirety of it would just become giant black holes and um that's going to take a while though so but but what i'm saying is then that logic can be used to say it's all meaningless i'm saying all that to say that multitask learning [Music] might be your song you're calling it practical or pragmatic or whatever that might be the thing that achieves something very akin to intelligence while we're trying to solve the more uh general problem of self-supervised learning and background knowledge so the reason i bring that up maybe one way to ask that question i've been very impressed by what tesla auto poly team is doing i don't know if you got a chance to glance at this particular one example of multi-task learning where they're literally taking the problem like i don't know charles darwin starts studying animals they're studying the problem of driving and asking okay what are all the things you have to perceive and the way they're solving it is one there's an ontology where you're bringing that to the table so you're formulating a bunch of different tasks it's like over a hundred tasks or something like that then they're involved in driving and then they're deploying it and then getting data back from people that run into trouble and they're trying to figure out do we add tasks do we like we focus on each individual task separately sure in fact half so the i would say i'll classify andre carpathi's talk in two ways so one was about doors and the other one about how much image net sucks he kept going back and forth on those two topics which image that sucks meaning you can't just use a single benchmark there's so like you you have to have like a giant suite of benchmarks to understand how well your your system actually i agree with him i mean he's uh he's a very sensible guy now okay it's it's very clear that if you're faced with a an engineering problem that you need to solve in a relatively short time particularly if you have it almost breathing down your neck you're going to have to take shortcuts right you you you might think about the the fact that the the the right thing to do in the long-term solution involves you know some fancy self-supervisioning but you have you know you know almost reading on your neck uh and you know this involves uh you know human lives and so you you have to basically just do the systematic uh engineering and you know uh fine tuning and refinements and trial and error and and all that stuff um there's nothing wrong with that that's that's called engineering that's called you know uh putting technology out uh in the in the world um and and you have to kind of ironclad it before before you do this you know so much for you know grand grand ideas and principles um but you know i'm placing myself sort of you know some you know upstream of this queen or quite a bit of stream of this your plato think about platonic forms you're you're platonic because eventually i want that stuff to get used but uh it's okay if it takes five or ten years for the community to realize this is the right thing to do i've i've done this before it's been the case before that you know i've made that case i mean if you look back in the mid-2000s for example and you ask yourself the question okay i want to recognize cars or faces or whatever you know i can use convolutional net so i can use a more conventional kind of computer vision techniques you know using uh interest point detectors or swift density features and you know sticking an svm on top at that time the data sets were so small that those methods that use more hand engineering worked better than companies there was just not enough data for comnets and contests were were a little a little slow with the kind of hardware that was available at the time and there was a c change when uh basically when you know data sets become bigger and and gpus became available that that's what you know those two of the main factors that basically made people change their change their mind and you can you can look at the history of like all sub-branches of ai or pattern recognition and there's a similar trajectory followed by techniques where people start by you know engineering the hell out of it um you know be it optical character recognition speech recognition computer vision like image recognition in general uh natural language understanding like you know translation things like that right you start to engineer the hell out of it um you start to acquire all knowledge the prior knowledge you know about image formation about you know the shape of characters about you know morphological operations about like feature extraction fourier transforms you know very quickly moments you know whatever right people have come up with thousands of ways of representing images so that they could be easily uh classified afterwards same for speech recognition right there is you know two decades for people to figure out a good front end uh to pre-process uh speech signals so that you know the information about what is being said is preserved but most of the information about the identity of the speaker is gone um you know kestrel coefficients or whatever right um and same for for text right uh you do need entity recognition and you parse and you you you do tagging of of of the parts of speech and you know you do this sort of tree representation of clauses and all that stuff right before you can do anything so that's how it starts right just engineer the hell out of it and then you start having data and maybe you have more powerful computers maybe you know something about statistical learning so you start using machine learning and it's usually a small sliver on top of your kind of handcrafted system where you know you extract features by hand okay and now you know nowadays the standard way of doing this is that you train the entire thing end to end with the deep learning system and it learns its own features and and you know speech recognition systems nowadays uh ocr systems are completely end-to-end it's uh you know it's some giant neural net that takes raw waveforms and produces a sequence of characters coming out and it's just a huge neural net right there's no you know minecraft model there's no language model that is explicit other than you know something that's ingrained in the in the sort of neural language model if you want same for translation same for all kinds of stuff so you see this continuous evolution from you know less and less hand crafting and more and more learning um and uh i i think it's true in biology as well so i mean we might disagree about this maybe not uh in this one little piece at the end you mentioned active learning it feels like active learning which is the selection of data and also the interactivity needs to be part of this giant neural network you cannot just be an observer to do self-supervised learning you have to well i don't self supervise learning is just a word but i would whatever this giant stack of a neural network that's automatically learning it feels my intuition is that you have to have a system whether it's a physical robot or a digital robot that's interacting with the world and doing so in a flawed way and improving over time in order to to form the self-supervised learning well you can't just give it a giant sea of data okay i agree and i disagree okay i agree in the sense that i think uh i agree i agree in two ways the first the first way i agree is that if you want uh and you certainly need a causal model of the world that allows you to predict the consequences of your actions to train that model you need to take actions right you need to be able to act in a world and see the effect for you to be to learn causal models of the world well so that's not that's not obvious because you can observe others you can observe others and you can infer that they're similar to you and then you can learn from that yeah but then you have to kind of hardwire that part right you know mirror neurons and all that stuff right so um and it's not clear to me how you would do this in a machine so um so i think the the action part would be necessary for having causal models of of the world the second reason it may be necessary or at least more efficient is that uh active learning basically you know goes for the jiggler of what you're what you don't know right is this you know obvious areas of uncertainty uh about your your world and about the how the world behaves and you can resolve this uncertainty by systematic exploration of that part that you don't you don't know and if you know that you don't know then you know it makes you curious you kind of look into situations that and uh you know across the animal uh world different species are different levels of curiosity right yeah depending on how they're built right so you know cats and rats are incredibly curious uh dogs not so much i mean less yeah so it could be useful to have that kind of curiosity so it'd be useful but curiosity just makes the process faster it doesn't make the process exist the so what process what learning process is it that active learning makes more efficient and i'm asking that first question uh you know you know we haven't answered that question yet so you know i worry about active learning once this question is so it's the more fundamental question to ask and if active learning or interaction increases the efficiency of the learning see sometimes it becomes very different if the increase is several orders of magnitude right like that's true but fundamentally still the same thing and building up the intuition about how to in a self-supervised way to construct background models efficient or inefficient is um is the core problem what do you think about yoshi banjos talking about consciousness and all of these kinds of concepts okay um i don't know what consciousness is but uh it's a good opener and to some extent a lot of the things that are said about consciousness remind me of the questions people were asking themselves in the 18th century or 17th century when they discovered that uh you know how the eye works and the fact that the image at the back of the eye was upside down right because you have a lens and and so on your retina the image that forms is an image of the world but it's upside down how is it that you see right side up and you know with what we know today in science you know we realize this question doesn't make any sense or or is kind of ridiculous in some way right so i think a lot of what is said about consciousness is of that nature now that said there's a lot of really smart people that uh for whom i have a lot of respect who are talking about this topic people like david chalmers who is the colleague of mine at nyu i have kind of a unorthodox folk speculative hypothesis about consciousness so we're talking about this audio world model and uh i think you know our entire prefrontal cortex basically is uh the engine for our world model but when we are attending at a particular situation we're focused on that situation we basically cannot attend to anything else and that seems to suggest that we basically have only one world model engine in our pre-photo cortex that engine is configurable to the situation at hand so we are building a box out of wood or we are you know driving uh down the highway playing chess we we basically have uh a single model of the world that we configure into the situation at hand which is why we can only attend to one task at a time now if there is a task that we do repeatedly it it goes from the sort of deliberate reasoning using model of the world and prediction and perhaps something like model predictive control which i was talking about earlier to something that is more subconscious that becomes automatic so i don't know if you've ever played against a chess grandmaster uh you know i get wiped out in you know 10 flies right and you know i have to think about my move for you know like 15 minutes uh and the person in front of me the grandmaster you know would just like react within seconds right you know he doesn't need to think about it that's become part of the subconscious because you know it's basically just pattern recognition at this point um same you know you the first few hours you drive a car you're really attentive you can't do anything else and then after 20-30 hours of practice 50 hours you know a subconscious you can talk to the person next to you you know things like that right unless the situation becomes unpredictable and then you have to stop talking so that suggests you only have one model in your head and it might suggest the idea that consciousness basically is the module that configures this world model of yours you know you need to have some sort of executive kind of overseer that configures your word model for the situation at hand and that that leads to kind of the really curious concept that consciousness is not a consequence of the power of our minds but of the limitation of our brains but because we have only one world model we have to be conscious if we had as many role models as there are situations we encounter then we could do all of them simultaneously and we wouldn't need this sort of executive control that we call consciousness yeah interesting and somehow maybe that executive controller i mean the the hard problem of consciousness there's some kind of chemicals in biology that's creating a feeling like it feels to experience some of these things that's kind of like the hard question is what the heck is that and why is that useful maybe the more pragmatic question why is it useful to feel like this is really you experiencing this versus just like information being processed um it could be just a very nice side effect of um of the way we evolved that's just very useful to to uh feel a sense of uh ownership to the decisions you make to the perceptions you make to the model you're trying to maintain like you own this thing and it's the only one you got and if you lose it it's gonna really suck and so you should really send the brain some signals about it what ideas do you believe might be true that most or at least many people disagree with you with let's say in the space of machine learning well it depends who you talk about but i think so certainly there is uh a bunch of people who are nativists right who think that a lot of the basic things about the world are kind of hardwired in our you know minds things like you know the world is three-dimensional for example is that hardwired things like uh you know object permanence is something that we learn uh you know before the age of three months or so or are we born with it and there are you know very disa you know white disagreement among the you know cognitive scientists for this i think those things are actually very simple to learn um you know is it the case that the oriented edge detectors in v1 are learned or are they hardwired i think they are learned they might be learned before both because it's really easy to generate signals from the retina that actually will train edge detectors so um and again those are things that can be learned within minutes of uh opening your eyes right i mean you know since the 1990s we have algorithms that can learn oriented detectors completely unsupervised with the equivalent of a few minutes of real time so uh so those things have to be learned um and there's also those you know mit experiments where you kind of plug the optical nerve on the auditory cortex of a baby ferret right and that auditory cortex becomes a visual cortex essentially so you know clearly there's running taking place there so you know i think a lot of what people think are so basic that they need to be hardwired i think a lot of those things are learned because they are easy to learn jesus so you put a lot of value in the power of learning what kind of things do you suspect might not be learned is there something that could not be learned so your intrinsic drives are not learned they they there are the things that you know make humans uh human or make you know cats different from dogs right it's the the basic drives that are kind of hard-wired in our basal ganglia i mean there are people who are working on on this kind of stuff that's called intrinsic motivation in the context of reinforcement learning um so these are objective functions where the reward doesn't come from the external world it's computed by your own brain your own brain computes whether you're happy or not right it measures your degree of uh comfort or in comfort and and because it's your brain computing this presumably knows also how to estimate gradients of this right so um so it's easier to to learn when your objective is is intrinsic so that has to be hardwired the critic that makes long-term prediction of the outcome which is the eventual result of this that's learned and perception is learned and your model of the world is learned but let me take take an example of you know why the critic i mean example of how the critic might be learned right if i uh if i come to you um you know i reach across the table and i pinch your arm right complete surprise for you you would not have expected this i was expecting that the whole time but yes right let's say for the sake of the story yes um okay your visual ganglia is going to light up because it's going to hurt right and now your model of the world includes the fact that i may pinch you if i approach my uh my uh don't trust humans right my hand to your arm so if i try again you're gonna recoil and that's your critic uh your predictive you know your predictor of your uh ultimate pain uh uh system that predicts that something bad is going to happen when you recoil right to avoid it so even that can be learned that is drawing definitely this is what allows you also to uh you know define some goals right so um the fact that you know you're a school child you wake up in the morning and you go to school and you know it's not because you necessarily like waking up early and going to school but you know that there is a long-term objective you're trying to optimize so ernest becker i'm not sure if you're familiar with the philosopher he wrote the book denial of death and his idea is that one of the core motivations of human beings is our terror of death our fear of death that's what makes us unique from cats cats are just surviving they do not have a deep under like cognizance introspection that over the horizon is the end and he says that i mean there's a terror management theory that just all these psychological experiments that show the basically this idea that all of human civilization everything we create is kind of trying to forget if even for a brief moment that we're going to die when when do you think humans understand that they're going to die is it learned early on also like i don't know at what point i mean it's a it's a question like you know at what point do you realize that you know what death really is and i think most people don't actually realize what death is right i mean most people believe that you go to heaven or something right well so to push back on that what ernest becker says and um sheldon solomon all of those folks and i find those ideas a little bit compelling is that there is moments in life early in life a lot of this fun happens early in life when you are uh when you do deeply experience the terror of this realization and all the things you think about about religion all those kinds of things that we kind of think about more like teenage years and later we're talking about way earlier no it's like seven or eight years or something like that yeah you realize holy crap this is uh like the mystery the terror like it's almost like you're a little prey a little baby deer sitting in the darkness of the jungle of the woods looking all around you the darkness full of terror i mean that's that realization says okay i'm gonna go go back in the comfort of my mind where there's a well there is a deep meaning where there's a maybe like pretend i'm immortal however way however kind of idea i can construct to help me understand that i'm immortal religion helps with that you can you can delude yourself in all kinds of ways like lose yourself in the busyness of each day have little goals in mind all those kinds of things to think that it's going to go on forever and you kind of know you're going to die yeah and it's going to be sad but you don't really understand that you're going to die and so that's that's their idea and if i find that compelling because it does seem to be a core unique aspect of human nature that we were able to think that we're going we're able to really understand that this life is finite that seems important there's a bunch of different things there so first of all i don't think there is a qualitative difference between between us and cats in the term i think the difference is that we just have a better long-term ability to predict you know in the long term and so we have a better understanding of how the world works so we have better understanding of you know finance of life and things like that so we have a better planning engine than cats yeah okay um but what's the motivation for for planning well i think it's just a side effect of the fact that we have just a better planning engine because it makes us uh as i said you know the essence of intelligence is the ability to predict and so the because we're smarter as a side effect we also have this ability to kind of make predictions about our own future existence or lack thereof okay you say religion helps with that i think religion hurts actually it makes people worry about like you know what's going to happen after their death etc if you believe that you know you just don't exist after that so like you know it solves completely the problem at least you're saying if you don't believe in god you don't worry about what happens after death yeah i don't know why you worry about the about you know this life because that's the only one you have i think it's well i don't i don't know if i were to say what ernest becker says and i said i agree with him more uh than not is um you do deeply worry uh if you if you believe there's no god there's still a deep worry like of the mystery of it all like how does that make any sense that it just ends i don't think we can truly understand that this ride i mean so much of our life the consciousness the ego is uh invested in this in this being and then science keeps bringing humanity down from its pedestal and yeah that's another another example of it that's wonderful but for us individual humans we don't like to be brought down from a pedestal like but see you're fine with it because well so what ernest becker would say is you're fine with it because that's just a more peaceful existence for you but you're not really fine you're hiding from in fact some of the people that experience the deepest trauma uh that earlier in life they often before they seek extensive therapy will say i'm fine it's like when you talk to people who are truly angry how are you doing i'm fine the question is what's going on now i had a near death experience i had a very bad uh motorbike accident when i was 17. so but that didn't have any impact on my reflection on that topic so i'm basically just playing a bit of a devil's advocate pushing back and wondering is it truly possible to accept death and the flip side that's more interesting i think for ai and robotics is how important is it to have this as one of the suite of motivations is to not just avoid falling off the roof or something like that but ponder the the end of the ride if you listen to the stoics it's uh it's a great motivator it adds a sense of urgency so maybe to truly fear death or be cognizant of it might give a deeper meaning and urgency to the moment to live fully well maybe i don't disagree with that uh i mean i think what motivates me here is uh you know knowing more about about human nature i mean i think uh human nature and human intelligence is a big mystery it's a scientific mystery uh in addition to you know philosophical and etc but you know i'm a true believer in science so um and and and i do have kind of a belief that for complex systems like like the brain on the mind the the way to understand it is try to reproduce it with you know artifacts that you build because you know what's essential to it when you try to build it you know the same way i've used this analogy before with you i believe um the same way we only started to understand uh aerodynamics when we started building airplanes and that helped us understand how birds fly you know so i think there's kind of a similar process here where we don't have a theory of a full theory of intelligence but building you know intelligent artifacts will help us perhaps develop some you know underlying theory that encompasses not just artificial implements but also human and biological intelligence in general so you're an interesting person to ask this question about sort of all kinds of different other intelligent entities or intelligences what are your thoughts about kind of like the touring or the chinese room question if we create an ai system that exhibits a lot of properties of intelligence and consciousness how comfortable are you thinking of that entity as intelligent or conscious so you're trying to build now systems that have intelligence and there's metrics about their performance but that metric is external okay so how are you are you okay calling a thing intelligent are you going to be like most humans and be uh once again unhappy to be brought down from a pedestal of consciousness slash intelligence no i'm i'll be very happy to understand more about human nature human mind and human intelligence through the construction of machines that have similar abilities and if a consequence of this is to bring down humanity one notch down from it's already okay i'm just fine with it that's just the reality of life um so i'm fine with that now you were asking me about things that uh opinions i have that a lot of people may disagree with i think uh if we think about the design of an autonomous intelligence system so assuming that we are somewhat successful at some at some level of getting machines to learn models of the world predicting models of the world we have we build intrinsic motivation objective functions to drive the behavior of that system the system also has perception modules that allows it to estimate the state of the world and then have some way of figuring out the sequence of actions that you know to optimize a particular objective if it has a critic of the type that was describing before the thing that makes you recall your arm the second time i tried to pinch you um intelligent autonomous machine will have emotions i think emotions are an integral part of autonomous intelligence if you have an intelligent system that is driven by intrinsic motivation by objectives if it has a critic that allows you to predict in advance whether the outcome of a of a situation is going to be good or bad is going to have emotions it's going to have fear yes when it predicts that the outcome is gonna is gonna be bad and and something to avoid is gonna have elation when it predicts it's gonna be good um uh if it has drives to relate with humans um you know in some ways the way humans have um you know it's it's gonna be social right and so it's gonna have emotions about attachment and and things of that type so um so i think uh you know the the sort of sci-fi thing where you know you see commander data like having an emotion chip that you can turn off right i think that's ridiculous so i mean here's the difficult philosophical social question do you think there will be a time like a civil rights movement for robots where um okay forget the movement but a discussion like the supreme court that particular kinds of robots you know particular kinds of systems um deserve the same rights as humans because they can suffer just as humans can all those kinds of things well perhaps perhaps not like imagine that humans were that that you could uh you know die and be restored like you know you could be sort of you know be 3d reprinted and you know your brain could be reconstructed in its finest details our ideas of rights will change in that case if you can always just there's always a backup you could always restore maybe like the importance of murder will go down one notch that's right but also the uh your your you know desire to do dangerous things like you know you know doing skydiving or or you know or or you know race car driving you know car racing all that kind of stuff you know would probably increase or or you know airplane aerobatics or that kind of stuff right yeah it would be fine to do a lot of those things or explore you know dangerous areas and things like that it would kind of change your relationship so now it's very likely that robots would be like that because you know they'll be based on perhaps technology that is somewhat similar to today's technology and you can you can always have a backup so it's possible i don't know if you like video games but there's a there's a game called diablo and um oh my my sons are huge fans of this yes uh and in fact they made a game that's inspired by it awesome like built a game my three sons have a game design studio between them yeah that's awesome they came out with a game like it just came out nice again last year no this was last year earlier about a year ago that's awesome but so in diablo there's a something called hardcore mode which if you die there's no you're gone right that's it and so it's possible with ai systems for them to be able to operate successfully and for us to treat them in a certain way because they have to be integrated in human society they have to be able to die no copies allowed in fact copying is illegal it's possible with humans as well like cloning will be illegal even what's possible because cloning is not copying right i mean you don't reproduce the the mind of the person and like experience right it's just a delay twin so but then it's what we were talking about with computers that you'll be able to copy you right you'll be able to perfectly save pickle the the the mind state and it's possible that that would be illegal because that goes against um that will destroy the motivations of the system okay so let's say you you have a domestic robot okay sometime in the future yes and uh the domestic robot you know comes to you kind of somewhat pre-trained you know it can do a bunch of things yes but it has a particular personality that makes it slightly different from the other robots because that makes them more interesting and then because it's you know it's live with you for five years you've you've grown some attachment to it and vice versa and it's learned a lot about you or maybe it's not a household robot maybe it's uh maybe it's a virtual assistant that lives in your you know augmented reality glasses or whatever right uh you know the horror movie type thing right um and that system to some extent the the intelligence in that system is a bit like your child or maybe your phd student in a sense that there's a lot of you in that in that machine now right yeah and so if it were a living thing you would do this for free if you want right if it's your child your child can you know then live his or her own life and you know the fact that they learn stuff from you doesn't mean that you have any ownership of it right yeah but if it's a robot that you've trained perhaps you have some uh yeah intellectual property claim about intellectual property oh i thought you meant like uh permanent value in the sense this part of you is in well there is permanent value right so you would lose a lot if that robot were to be destroyed and you you had no backup you would lose a lot you know you a lot of investment you know kind of like a uh you know a person dying you know um that that a friend of a friend of you was dying or or a co-worker or something like that um but also uh you have like intellectual property rights in the sense that that that system is fine-tuned to your particular existence so that's now a very unique instantiation of that original background model whatever it was that arrived and then there are issues of privacy right because now imagine that that robot has its own kind of volition and decides to work from someone else yes or kind of you know thinks life with you is sort of untenable or whatever right um now all the things that that system learned from you uh you know how can you like you know delete all the personal information that that system knows about you yeah i mean that would be kind of an ethical question like you know can you erase the the mind of a of a intelligent robot uh to protect your your privacy yeah you can't do this with humans you can ask them to shut up but that you don't have complete power over them can't erase humans yeah it's the problem with relationships you know that you break up you can't you can't erase the other human with robots i think it'll have to be the same thing with robots that that risk that there has to be um some risk to our interactions to truly experience them deeply it feels like so you have to be able to lose your robot friend and that robot friend to go tweeting about how much of an you are but then are you allowed to you know murder the robot to protect your private information yeah probably decides to leave i have the situation that for robots with with certain like it's almost like uh regulation if you declare your robot to be let's call it sentient or something like that like this this robot is designed for human interaction then you're not allowed to murder these robots it's the same as murdering other humans well but what about you do a backup of the robot you do preserve on the on on a hard drive or the equivalent in the future that might be illegal just like it's like priority uh piracy is illegal but it's your own it's your own robot right but you can't you don't but then but then you can wipe out his brain so the this robot doesn't know anything about you anymore but you still have technically a certain existence because you backed it up and then there'll be these great speeches at the supreme court by saying oh sure you can erase the mind of the robot just like you can erase the mind of a human we both can suffer there'll be some epic like obama type character with a speech that we we like the robots and the humans are the same we can both suffer we can both hope we can both all those all those kinds of things raise families all that kind of stuff it's it's uh interesting for these just like you said emotion seems to be a fascinatingly powerful aspect of human human interaction human robot interaction and if they're able to exhibit emotions at the end of the day that's probably going to have us deeply consider human rights like what we value in humans what we value in other animals that's why robots and ai is great it makes us ask uh really good questions the hard questions yeah but you ask about you asked about the chinese room type argument you know is it real if it looks real yeah i think the chinese room argument is the ridiculous one so so so for people who don't know chinese room is uh you ca you can i don't even know how to formulate it well but basically you can mimic the behavior of an intelligent system by just following a giant algorithm code book that tells you exactly how to respond in exactly each case but is that really intelligent it's like a giant lookup table when this person says this you answer this when this person says this you answer this and if you understand how that works you have this giant nearly infinite lookup table is that really intelligence because intelligence seems to be a mechanism that's much more interesting and complex than this lookup table i don't think so so the i mean the real question comes down to do you think uh you know you can you can mechanize uh intelligence in some way even if that involves uh learning and the answer is of course yes there's no question there's a second question then which is uh assuming you can uh reproduce intelligence in sort of different hardware than biological hardware you know like computers uh can you you know match uh human intelligence in all the domains in which humans are intelligent is it possible right so that's the hypothesis of a strong ai the answer to this in my opinion is unqualified yes this will swell happen at some point there's no question that machines at some point will become more intelligent than humans in all domains where humans are intelligent this is not for tomorrow it's going to take a long time regardless of what you know elon and others have claimed or believed this is a lot a lot harder than many of many of those guys think it is and many of those guys who thought it was simpler than that years you know five years ago now i think it's hard because it's been five years and they realize it's it's gonna take a lot longer that includes a bunch of people deepmind for example but um oh interesting i haven't actually uh touched base with the deepmind folks but some of it elon or uh democracy i mean sometimes your role you have to kind of create deadlines that are nearer than farther away yeah to kind of create an urgency because you know you have to believe the impossible as possible in order to accomplish it and there's of course a flip side to that coin but it's it's a weird you can't be too cynical if you want to get something done absolutely i i agree with that but um i mean you have to inspire people right to work on certain ambitious things um so you know it's it's uh it's certainly a lot harder than we believe but there's no question in my mind that this will this will happen and now you know people are kind of worried about what does that mean for humans they are going to be brought down from their pedestal you know a bunch of notches with that and uh you know is that going to be good or bad i mean it's just going to give more power right it's an amplifier for human intelligence really so speaking of doing cool ambitious things fair the facebook ai research group has recently celebrated its 8th birthday or maybe you can correct me on that looking back what has been the successes the failures the lessons learned from the eight years affair and maybe you can also give context of where does the newly minted meta ai fit into how does it relate to fare right so let me tell you a little bit about the organization of all this uh uh yeah fair was created almost exactly eight years ago uh it wasn't called fair yet it took that name a few a few months later and at the time i joined facebook there was a group called the ai group that had about 12 engineers and a few science a few scientists like you know ten engineers and two scientists or something like that i ran it for three and a half years as a director you know hired the first few scientists and kind of set up the culture and organized it you know explain to the facebook leadership what what fundamental research was about and how it can work within uh industry and how it needs to be open and everything and i think it's been an unqualified success in the sense that fair has simultaneously produced you know top-level research and advanced the science and the technology provided tools open source tools like pytorch and many others but at the same time has had a direct or mostly indirect impact on facebook at the time now meta in the sense that a lot of systems that are that meta is built around now are are based on research projects that started at at fair so if you were to take out you know deep learning out of uh facebook services now and and meta more generally uh i mean the company would literally crumble i mean it's completely built around around ai these days and it's really essential to the operations so what happened after three and a half years is that i changed role i became chief scientist so i'm i'm not doing day-to-day management of affair anymore i'm more of a kind of you know think about strategy and things like that and and i carry my i conduct my own research i've you know my own kind of research group working on star supervision and things like this which i didn't have time to do when i was director so now uh fair is run by uh joel pineau and antoine board together because fair is kind of split in two now there's something called fair labs which is sort of bottom-up scientist-driven research and fair excel which is slightly more organized for bigger projects that require a little more kind of focus and more engineering support and things like that so joelle needs fair lab and antoine bourne leads very where are they located uh it's always delocalized all over um so there's no question that the leadership of the company believes that this was a very worthwhile investment and what that means is that uh it's it's there for the long run right so there is uh if you if you want to talk in these terms which i don't like there's a there's a business model if you want where where uh fair despite being a very fundamental research lab brings a lot of value to the company either mostly indirectly through other groups now what happened three and a half years ago when i stepped down is was also the creation of facebook ai which was basically a larger organization that covers fare so fair is included in it but also has other organizations that are uh focused on applied research or advanced development of ai technology that is more you know focused on the products of the company so less emphasis on fundamental research less fundamental but it's still research i mean there's a lot of papers coming out of those organizations and uh people are awesome awesome and you know wonderful to interact with and but it serves uh as kind of uh a way to you kind of scale up if you want um sort of ai technology which you know may be very experimental and and sort of lab prototypes into things that are usable so fair is a subset of meta ai it's fair become like kfc it it'll just keep the f nobody cares what the f stands for we'll know soon enough uh by uh probably probably by the end of the of 2021 this is not a giant change mayor fair well mayor doesn't sound too good but you know the the brand people are kind of deciding on this and they've been hesitating for for a while now and they you know they tell us they're going to come up with an answer as to whether fair is going to change name or whether we're going to change just the meaning of the f oh that's a good call i would keep fair and change the meaning of the f that would be my preference you know i would tend i would turn the f into fundamental oh that's what i researched oh that's really good yeah then meta ai so this would be fair affair yeah but you know people will call it fair right yeah exactly i like it and now meta ai uh is part of the reality lab so you know meta now the new facebook is called meta and it's kind of divided into you know facebook instagram whatsapp and reality lab and reality lab is about you know ar vr uh you know telepresence communication part uh technology and stuff like that that's kind of the you can think of it as the sort of a combination of um sort of new products and and technology part of uh of uh meta is that where the touch sensing for robots i saw that you were posting about that's that's what i touched on for robotics party fair actually that's that's it oh it is okay yeah this is also the no but there is the the other way the the haptic glove right yes that has like that's more reality that's that's reality lab research i have to have research but by the way the touch sensors are super interesting uh like integrating that modality into the whole uh sensing uh suite is very interesting so uh what do you think about the metaverse what do you think about this whole uh this whole kind of expansion of the view of the role of facebook and meta in the world well i made a verse really should be thought of as the next step in the internet right sort of trying to kind of you know make the experience more compelling of you know being connected either with other people or with content and you know we are evolved and trained to evolve in you know 3d environments where uh you know we can see other people we can talk to them when when we're near them or you know and other people are far away can hear us you know things like that right so it it there's a lot of social conventions that exist in the real world that we can try to transpose now what is going to be eventually the the uh how compelling is it going to be like our you know uh is it going to be the case that people are going to be willing to do this if they have to wear you know a huge pair of goggles all day maybe not right but then again if the experience is sufficiently compelling maybe so or if the device that you have to wear is just basically a pair of glasses you know technology makes sufficient progress for that um you know ar is a much easier concept to grasp that you're going to have you know augmented reality glasses that basically contain some sort of you know virtual assistant that can help you in your daily lives but at the same time with the ar you have to contend with reality with vr you can completely detach yourself from reality so it gives you freedom it might be easier to design worlds in in vr yeah but you you can imagine how you know the metaverse being a mix a mix right or or like you can have objects that exist in the metaverse that you know pop up on top of the real world or only exist in virtual reality okay let me ask the hard question oh because all of this was easy so this was easy uh the facebook now meta the social network has been painted by the media as a net negative for society even destructive and evil at times you've pushed back against this defending facebook can you explain your defense yeah so the the description the company that is being described in the in some media uh is not the company we know when we work inside and you know it could be claimed that a lot of employees are uninformed about what really goes on in the company but you know i'm a vice president i mean i have a pretty good vision of what goes on you know i don't know everything obviously i'm not involved in in everything but certainly not in decision about like you know content moderation or anything like this but but i have you know some decent vision of what goes on and this evil that is being described i just don't see it and then you know i think there is an easy story to buy which is that you know all the bad things in the in the world and you know the the reason your friend believe crazy stuff um you know there's an easy scapegoat right in the uh in in in social media in general uh facebook in particular but you have to look at the data like is it the case that uh facebook for example uh polarizes people politically um are there academic studies that show this is it the case that uh you know teenagers uh think of themselves less if they use instagram more is it the case that uh you know people get more riled up against you know opposite sides in a in a debate or political opinion if they if they are more on facebook or if they are less and study after study show that none of this is true this is independent studies by academic they're not funded by facebook or meta um you know study by stanford by some of my colleagues at nyu actually with whom i have no connection um you know there's a study recently they they paid people i think it was in um in in the former yugoslavia i'm not exactly sure in what what part but they paid people to not use facebook for a while in the period before the anniversary of the serenity massacres right so you know people get riled up like should you know should we have a celebration i mean a memorial kind of celebration for it or not so they paid a bunch of people to not use facebook for a few weeks it turns out that those people ended up being more polarized than they were at the beginning and the people who were more on facebook were less polarized there's a study you know from stanford of uh economists at stanford that tried to identify the causes of uh increasing polarization in the u.s and it's been going on for 40 years before you know mark zuckerberg was born yeah uh continuously and um and uh so if there is a cause it's not facebook or social media so you could say social media just accelerated but no i mean it's basically a continuous uh evolution by some measure of polarization in the us and then you compare this with other countries like uh the the west half of germany because you can't go 40 years in east eastside or denmark or or other countries and they use facebook just as much and they're not getting more polarized they're getting less polarized so if you want to look for you know a causal relationship there you can find a scapegoat but you can't find the cause now if you want to fix the problem you have to find the right cause and what rise me up is that people now are accusing facebook of bad deeds that are done by others and those others are we're not doing anything about them and by the way those others include the owner of the wall street journal in which all of those papers were published so i should mention that i'm talking to shrek mike schrepp for on this podcast and also mark zuckerberg and probably these conversations you can have with them because it's very interesting to me even if facebook has some measurable negative effect you can't just consider that in isolation you have to consider about all the positive ways that it connects us so like every technology there's people it's that question you can't just say like uh there's an increase in division yes probably google search engine has created increase in division we have to consider about how much information it brought to the world like i'm sure wikipedia created more division if you just look at the division we have to look at the full context of the world and didn't make a better world yeah the printing press has created more differences right exactly so you know when the printing press was invented uh the first books that were that were printed were things like the bible and that allowed people to read the bible by themselves not get the message uniquely from priests in europe and they created you know the protest movement and 200 years of religious persecution and wars so that's a bad side effect of the printing press you know social networks aren't being nearly as bad as the printing press but nobody would say that printing price was a bad idea yeah a lot of it's perception and there's a lot of different incentives operating here um maybe a quick comment since you're one of the top leaders at facebook and at meta sorry that's in the tech space i'm sure facebook involves a lot of incredible technological uh challenges that need to be solved a lot of it probably is on the computer infrastructure the hardware the i mean it's just a huge amount maybe can you give me context about how much of shrek's life is ai and how much of it is low level compute how much of it is flying all around doing business stuff and the same with zuckerberg mark zuckerberg they really focus on ai i mean certainly uh in the uh in the run-up of the creation affair and for you know at least a year after that if not more mark was was very very much focused on on ai and was spending quite a lot of effort um on it and that's his style when he gets interested in something he reads everything about it you know he read some of my papers for example before he joined um and uh so he he learned a lot about it like notes right and uh uh you know schwepp was really into it also i mean tripe is really kind of um you know has something i've tried to preserve also despite my uh not so young age which is a sense of wonder about science and technology and he certainly certainly has that um he's also a wonderful person i mean in terms of like as a manager like dealing with people and everything mark also actually um so i mean they're very like you know very human people for in the case of markets uh shockingly human you know given his uh his trajectory um uh i mean the personality of him that is painting in the press is just completely wrong yeah but you have to know how to play the press so that's i i put some of that responsibility on him too you have to it's like um you know like the director the conductor of an orchestra you have to play the press and the public in a certain kind of way where you convey your true self to them if there is a depth and kindness it's hard and it's hard and it's probably not the best at it so yeah you have to learn uh and it's it's sad to see and i'll talk to him about it but the shrek is slowly stepping down it's always uh sad to see folks sort of be there for a long time and slowly i guess time i think i think he's done the thing he set out to do and you know he's he's got you know uh family priorities and stuff like that and um i understand you know after 13 years or something it's been a good run which in silicon valley is basically a lifetime yeah you know because you know it's dog years so uh in europe's the conference just wrapped up uh let me just go back to something else you posted the paper you co-authored was rejected from europe as you said proudly in quotes rejected can you joke yeah i know uh can you describe this paper and like what was the idea in it and also maybe this is a good opportunity to ask what are the pros and cons what works and what doesn't about the review process yeah let me talk about the paper first i'll talk about the review we'll talk about the review process uh afterwards um the paper is called vkrag so this is i mentioned that before variance in variance covariance regularization and it's a technique a non-contrastive learning technique for what i call joint embedding architecture so siamese nets are an example of joint invading architecture so gentlemen architecture is uh let me back up a little bit right so if you want to do supervised running you can you can do it by prediction so let's say you want to train your system to predict video right you show it a video clip and and you train the system to predict the next the continuation of that video clip now because you need to handle uncertainty because there are many you know many continuations that are plausible you need to have you need to handle this in some way you need to have a way for the system to be able to produce multiple predictions and the way the only way i know to do this is through what's called a latent variable so you have some sort of hidden vector of a variable that you can vary over a set or draw from a distribution and as you vary this vector over a set the output the prediction varies over a set of plausible predictions okay so that's called i call this a generative latent variable model okay now there is an alternative to this to handle uncertainty and instead of directly predicting the the next frames of the of the of the clip you also run those through another neural net uh so you now have two neural nets one that looks at the the uh you know the initial segment of the video clip and another one that looks at the the continuation during training right and what what you're trying to do is learn a representation of those two video clips that is maximally informative about the video clips themselves but it's such that you can predict the representation of the second video clip from the representation of the first one easily okay and you can sort of formalize this in terms of maximizing virtual information some stuff like that but it doesn't matter what you want is informative representative represent you know informative representations of the two video clips that are mutually predictable what that means is that there's a lot of details in the second video clips that are irrelevant you know i let's say a video clip consists in you know a camera panning the scene there's going to be a piece of that room that is going to be revealed and i can somewhat predict what the what that room is going to look like but i may not be able to predict the details of the texture of the ground and where the tiles are ending and stuff like that right so those are irrelevant details that perhaps my representation will eliminate and so what i need is to train this second neural net in such a way that uh whenever the the the continuation video clip varies over all the plausible continuations the representation doesn't change got it okay so it's the yeah yeah got it all over the space of representations doing the same kind of thing as you're doing with similarity learning right yeah so so these are two ways to handle multi-modality in a prediction right in the first way you parameterize the prediction with a latin variable but you predict pixels essentially right in the second one you want you don't predict pixels you predict an abstract representation of pixels and you guarantee that this has track representation has as much information as possible about the input but sort of you know drops all the stuff that you really can't predict essentially i used to be a big fan of the first approach and in fact in this paper with the chain mishra this this blog post the dark matter intelligence i was kind of advocating for this and in the last year and a half i've completely changed my mind i'm now a big fan of the second one and it's because of a small collection of algorithms that have been proposed over the last uh uh year and a half or so two years uh to do this uh including v craig its predecessor called barbie twins which i mentioned uh a method from our friends of deepmind called byol and and and there's a bunch of others now that kind of work similarly so they're all based on this idea of joint embedding some of them have an explicit criterion that is an approximation of mutual information some others will be aol work but we don't really know why and there's been like lots of theoretical papers but why be where it works no it's not bad because we take it out and it still works and you know blah blah blah i mean so there's like a big big debate but um uh but the important point is that we now have a collection of non-contrastive joint embedding methods which i think is the best thing since sliced bread so i'm super excited about this because i think it's our best shot for techniques that would allow us to kind of build predictive world models and at the same time learn hierarchical representations of the world where what matters about the world is preserved and what is irrelevant is eliminated by the way the representation is the before and after is in the space in a sequence of images or is it for single images uh it would be either for a single image for a sequence it doesn't have to be images this could be applied to text it could be applied to just about any signal i'm looking at you know i'm looking for methods that are generally applicable that are not specific to you know one particular modality you know it could be audio or whatever got it so what's the story behind this paper this this paper is what is is describing one of the one such method this is this vikrant method so the cisco authored the first author is a student called adrian bard who is a resident phd student at fair paris who is co-advised by me and jean ponce who's a professor at economic superior also a research director at inria so this is a wonderful program in france where phd students can basically do their phd in industry and that's kind of what what's happening here and this paper is a follow-up on the this bottle twin paper by yeah my former post dog now stefan dunny uh with li jing and and yurish montar and a bunch of other people from from from fair and one of the main criticism from reviewers is that v craig is not different enough from battle twins but you know my impression is that it's you know bottle twins with a few bugs fixed essentially and uh in the end this is what people will use right so but you know i'm used to stuff yeah that assume it being rejected forward so it might be rejected and actually exceptional excited because people use it well it's already decided like a bunch of times so i mean the the question is then to the deeper question about peer review and conferences i mean computer science is a field is kind of unique that the conference is highly prized that's one right and uh it's interesting because the peer review process there is similar as opposed to journals but it's accelerated significantly well not significantly but it goes fast and it's nice way to get stuff out quickly uh to be reviewed quickly go to present it quickly to the community so not quickly but quicker yeah but nevertheless it has many of the same flaws of uh peer review because it's a limited number of people look at it there's bias in the following like that if you if you want to do new ideas you're going to get pushed back um they're self-interested people that kind of can infer who submitted it and kind of you know be cranky about it all that kind of stuff yeah i mean there's a lot of you know social phenomena there um there's one social phenomenon which is that because the field has been growing exponentially the vast majority of people in the field are extremely junior yeah so as a consequence and that's just a consequence of the field growing right so as the number of the size of the field kind of starts saturating you you will have less of that problem of reviewers being very uh inexperienced a consequence of this is that you know young reviewers i mean there's a phenomenon which is that reviewers try to make their life easy and to make their life easy when reviewing a paper is very simple you just have to find a flaw in the paper right so basically they see their task as finding flaws in papers and most papers have flaws even the good ones yeah um so it's it's easy to you know to do that you your job is easier as a reviewer if you just focus on this but what's important is like is there a new idea in that paper that is likely to influence it doesn't matter if the experiments are not that great if the protocol is you know uh so so you know things like that as long as there is a worthy idea in it that will influence the way people think about the problem um even if they make it better you know eventually i think that's uh that's really what what makes a paper useful and so this combination of uh social phenomena creates a a a disease that has plagued you know other fields in the past like speech recognition where basically you know people chase numbers on uh on benchmarks and and it's much easier to get a paper accepted if it brings an incremental improvement on a sort of mainstream well-accepted method or problem and those are to me boring papers i mean they're not useless right because uh industries you know strives on on those kind of progress but they're not the one that i'm interested in in terms of like new concepts and new ideas so uh papers that are really uh trying to strike kind of new advances generally don't make it now thankfully we have archive archive exactly and then there's uh open review type of situations where you and then i mean twitter is a kind of open review i'm a huge believer that reviews should be done by thousands of people not two people i agree uh and so archive like do you see a future where a lot of really strong papers it's already the present but a growing future where it'll just be archived and you're presenting an ongoing continuous conference called twitter and slash the internet uh slash archive sanity andre just released a new version uh so just not you know not being so elitist about this particular uh gating it's not a question of being elitist or not it's a question of uh being basically uh recommendation and seal of approvals for people who don't see themselves as having the ability to do so by themselves right so it saves time right if you rely on other people's opinion and you trust those people or those groups to evaluate a paper for you that saves you time because you know you don't have to like scrutinize uh the paper as much you know it is brought to your attention i mean it's a whole idea of sort of you know collective recommender system right so i actually thought about this a lot um you know about 10 15 years ago uh because there were discussions at um nips and you know and were about to create iclear with yoshi banjo and so i wrote a document um kind of describing a reviewing system which basically was you know you post your paper on some repository let's say archive or now could be open review um and then you can form uh a reviewing entity which is equivalent to a reviewing board you know a journal or a program committee of a conference you have to list the members and then that group reviewing entity can choose to review a particular paper spontaneously or not there is no exclusive relationship anymore between a paper and a venue or reviewing entity any reviewing entity can review any paper or may choose not to and then you know give an evaluation it's not published or published it's just an evaluation and a comment which would be public signed by the reviewing entity and if it's time by reviewing entity you know it's one of the members of reviewing entities so if the reviewing entity is you know lex treatments you know preferred papers right you know it's like friedman writing a review yes what so for me one that's a beautiful uh system i think but what's in addition to that it feels like there should be a reputation system for the reviewers for the reviewing entities not the reviewers individually the reviewing entities sure but even within that there are viewers too because uh i it's there's another thing here it's not just the reputation it's an incentive for an individual person to do great right now in in the academic setting the incentive is kind of uh internal just wanting to do a good job but honestly that's not a strong enough incentive to do a really good job in reading a paper and finding the beautiful amidst the mistakes and the flaws and all that kind of stuff right like if you're the person that first discovered a powerful paper and you get to be proud of that discovery then that gives a huge incentive to you that's that's a big part of my proposal actually i described that as you know if if your evaluation of papers is predictive of future success yes yes okay then your reputation should go up as a reviewing entity so yeah exactly i mean that um i even had a master's student who was a master student in uh library science and computer science actually kind of work out exactly how that should work with formulas and everything but so in terms of implementation do you think that's something that's doable i mean i've been sort of you know talking about this to sort of various people like you know andrew mccallum who started open review and the reason why we picked open review for iclear initially even though it was very early for them is because my hope was that iclear were it was eventually going to kind of inaugurate this type of system so iclear kept the idea of open reviews so where the reviews are you know published with the paper which i think is very useful but in many ways that's kind of reverted to kind of more of a conventional type conferences for everything else and that i mean i i don't run ikea i'm just the president of the foundation but um you know people who run it should make decisions about how to run it and i'm not going to tell them because they're volunteers and i'm really thankful that they do that so but i'm saddened by the fact that we're not being innovative enough yeah me too i hope that changes uh yeah because the communication of science broadly but communication computer science ideas is how you make those ideas have impact i think yeah and i think you know a lot of this is um because people have in their mind kind of an objective which is you know fairness for authors and the ability to count points basically and and give credits accurately but that comes at the expense of the progress of science so to some extent we're slowing down the progress of science and are we actually achieving fairness and we're not achieving fairness you know we have biases you know we're doing you know double blind review but uh you know the the biases are still there there are different different kinds of biases you write that the phenomenon of emergence collective behavior exhibited by a large collection of simple elements in interaction is one of the things that got you into neural nets in the first place i love cellular automata i love simple interacting elements and the things that emerge from them do you think we understand how complex systems can emerge from such simple components that interact simply no we don't it's a big mystery also it's a mystery for physicists a mystery for biologists you know how is it that uh the uh universe around us seems to be increasing in complexity and not decreasing i mean that that is a kind of a curious property of uh physics that despite the second law of thermodynamics we seem to be you know evolution and learning and etc seems to be can at least locally to increase complexity not decrease it so perhaps the ultimate purpose of the universe is to just get more complex have these i mean uh small pockets of beautiful complexity does that to sell your automated these kinds of emergence and complex systems give you some intuition or guide your understanding of machine learning systems and neural networks and so on or are these for you right now desperate concepts well you got it got me into it you know i uh i discovered the existence of the perceptron when i was a college student you know by really good book and it was a debate between chomsky and piaget and seymour pepper from mit was kind of singing the praise of the perception in that book and i i the first time i heard about the learning machine right so i started digging the literature and i found those paper those books which were basically trans transcription of you know workshops or conferences from the 50s and 60s about self-organizing systems so there were there was a series of conferences on self-organizing systems and the these books on this um some of them are you can actually get them at the internet archive you know the digital version uh and there are like uh fascinating articles in there by this guy whose name has been largely forgotten heinz von firster he's a the german physicist who immigrated to the u.s and uh worked on self-organizing systems uh in the in the 50s and in the 60s he created at you know serving in urbana champagne he created a biological computer laboratory bcl which was you know all about neural nets unfortunately that was kind of towards the end of the popularity of neural nets so that that lab never kind of strived very much but but he wrote a bunch of papers about self-organization and the mystery of self-organization an example he has is you take imagine you are in space there's no gravity you have a big box with uh magnets in it okay um you know what kind of rectangular magnets with north pole on one end southbound on the other end you shake the box gently and the magnets will kind of stick to themselves and probably form a complex structure um you know spontaneously you know that could be an example of self-organization but you know you have lots of examples neural nets are an example of self-organization to you know in many respects and it's a it's a bit of a mystery um you know how like what what is possible with this um you know pattern formation in physical systems in chaotic system and things like that you know you know the emergence of life you know things like that so you know how does does that happen it's a it's a big puzzle for for physicists as well it feels like understanding this the the mathematics of emergence in some constrained situations might help us create intelligence like uh help us add a little spice to the systems because um you seem to be able to in complex systems with emergence to be able to get a lot from little and so that seems like a shortcut to get big leaps in performance but um but there's there's a missing conservative concept that we are we don't have yeah uh and it's uh it's something also i've been fascinated by since uh my undergrad days and it's how you measure complexity right so we don't actually have good ways of measuring or at least we don't have good ways of interpreting the measures that we we have at our disposal like how do you measure the complexity of something right so there's all those things you know like you know common goal of chatting solomon of complexity of you know the length of the shortest program that would generate a b string can be thought of as the complexity of that bit string right um i've been fascinated by that concept the problem with that is that that complexity is defined up to a constant which can be very large right there's there are similar concepts that are derived from you know you know bayesian probability theory where you know the complexity of something is the negative log of its probability essentially right and you have a complete equivalence between the two things and there you would think you know the probability is something that's well defined mathematically which means complexity is well-defined but it's not true you need to have a model of of of the distribution and you may need to have a prior if you're doing bayesian inference and the prior plays the same role as the choice of the computer with which you measure your graph complexity and so every measure of complexity we have has some arbitrary necessity um you know an additive constant which is can be arbitrarily large and so you know how can we come up with a good theory of how things become more complex if we don't have a good measure of complexity yeah which we need for this one way that people study this in the space of biology the people that study the origin of life or try to recreate the life in in the laboratory and the more interesting one is the alien one is when we go to other planets how do we recognize this life because you know complexity we associate complexity maybe some level of mobility with life you know we have to be able to like have concrete algorithms for like um measuring the level of complexity we see in order to know the difference between life and non-life and the problem is that complexity is in the eye of the beholder so let me give you an example if i um if i give you uh an image of the endless digits right and i flip through any digits there is some obviously some structure to it because local structure you know neighboring pixels are correlated uh across the entire data set now imagine that i apply a random permutation to all the pixels a fixed random permutation now i show you those images they will look you know really disorganized to you more complex in fact they're not more complex in absolute terms they're exactly the same as originally right and if you knew what the permutation was you know you could undo the permutation now imagine i give you special glasses that undo that permutation now all of a sudden what looked complicated becomes simple right so if you have two if you have you know humans on one end and then another race of aliens that sees the universe with permutation glasses yeah with the permutation glasses what we perceive as simple to them is hardly complicated it's probably heat yeah heat yeah okay and what they perceive as simple to us is is random uh fluctuation it's heat yeah so truly in the eye of the beholder it depends what kind of glasses you're wearing right depends what kind of algorithm you're running in your perception system so i don't think we'll have a theory of intelligence self-organization evolution things like that until we have a good handle on a notion of complexity which we know is in the higher the eye of the beholder yeah it's sad to think that we might not be able to detect or interact with alien species because we're wearing different glasses um because their notion of locality might be different from ours yeah this actually connects with fascinating questions in physics at the moment like modern physics quantum physics like you know questions about like you know can we recover the information that's lost in a black hole and things like this right and uh and that relies on notions of complexity um yeah which you know i find i find this fascinating can you describe your personal quest to build an expressive electronic wind instrument ewi what is it what does it take to uh to build it well i'm a tinkerer i like building things i like building things with combinations of electronics and you know mechanical stuff um you know i have a bunch of different hobbies but um you know probably my first one was little was building model airplanes and stuff like that and i still do that to some extent but also electronics i taught myself electronics before i studied it and the reason i taught myself electronics is because of music my cousin was an aspiring electronic musician and he had an analog synthesizer and i was you know basically modifying it for him and building sequencers and stuff like that right for him i was i was in high school when i was doing this that's the interesting like progressive rock like 80s like what's what's the greatest band of all time according to yala um there's two there's too many of them but you know it's a combination of uh uh you know my vision orchestra uh weather report yes genesis uh you know yes gentle giant you know things like that great okay so this uh this level of electronics and this love of music combined together right so i was actually trying to play a baroque and renaissance music and i played in a orchestra when i was in high school in uh first years of college and i played the recorder chrome horn a little bit of oboe you know things like that so i'm a win instrument player but i always wanted to play improvised music even though i don't know anything about it uh and the only way i figured you know short of like learning to play saxophone was to play electronic instruments so they behave like the fingering is similar to a saxophone but you know you have wide variety of sound because you control the synthesizer with it so i had a bunch of those you know going back to the late 80s from either yamaha or akai they they're both kind of the main manufacturers of those so they were classically you know going back several decades uh but i've never been completely satisfied with them because of lack of exclusivity and you know those things you know are somewhat expensive i mean they measure the breast pressure they measure the lip pressure and you know uh you have like various parameters you can you can vary it with fingers but they they they're not really as expressive as a acoustic instrument right you um you hear john coltrane play two notes and you hear you know his john cochrane you know he's got a unique sound uh or or mike davis right you can hear his mice davis uh playing the trumpet because the the sound reflects their you know physiology basically the shape of the vocal tract um kind of shapes the the sound so how how do you do this with uh electronic instrument and i was many years ago i met a guy called david wessel he he was a professor at berkeley and created the center for like you know music technology there and he was interested in that question and so i kept kind of thinking about this for many years and finally because i covered you know i was at home i was in my workshop my workshop serves also as my kind of zoom uh room and home office and this is in new jersey in new jersey and um i started uh really being serious about you know building my own ewe instrument what else is going on in that new jersey workshop is there some is there some crazy stuff you built like just or or like left on the workshop floor left behind a lot of crazy stuff is uh you know electronics with built with microcontrollers of various kinds and you know weird flying contraptions so you still love flying it's a family disease my my dad got me into it when i was a kid and he was building model airplanes when he was a kid and uh and he was a mechanical engineer he taught himself electronics also so he he built his early radio control systems in the late 60s early 70s um and so that that's what got me into i mean he got me into kind of you know engineering and science and technology do you also have an interest in appreciation of flight in other forms like with drones quadropters or do you do yes is it model airplane the thing you know i you know before drones were you know kind of a consumer product um you know i built my own you know with also building a microcontroller with uh gyroscopes and accelerometers for stabilization writing the firmware for it you know and then when it became kind of a standard thing you could buy it was boring you know i stopped doing it it was not fun anymore um yeah you were doing it before it was cool yeah what uh advice would you give to a young person today in high school and college that dreams of doing something big like young lacoon like let's talk in the space of intelligence dreams of having a chance to solve some fundamental problems in space of intelligence both for their career and just in life being somebody who was a part of creating something special so try to get interested by big questions things like you know what is intelligence uh what is the universe made of what's life all about things like that um like even like crazy big questions like um what's time like nobody knows what time is and and then learn basic things like basic methods either from math from physics or from engineering things that have a long shelf life like if you have a choice between like you know learning uh you know mobile programming on iphone or quantum mechanics take quantum mechanics um because you're going to learn things that you have no idea exist it and you may not you nev you know you may never be a quantum physicist but you will learn about path integrals and path integrals are used uh everywhere it's the same formula that you use for you know bayesian integration and stuff like that so the ideas the little ideas within quantum mechanics within some of these kind of more solidified fields will have a longer shelf life that you'll somehow use indirectly in in your work learn classical mechanics like you learn about lagrangians for example um which is like a huge hugely useful concept you know for all kinds of different things uh learn uh statistical physics because uh all the math that comes out of you know for machine learning uh basically comes out of uh was figured out by statistical physicists in the you know late 19th early 20th century right so uh and for some of them actually more recently for by people like giorgio perezi who just got a nobel prize for the replica method among other things um it's used for a lot of different things uh you know variational inference that math comes from statistical physics so um so a lot of those kind of you know busy courses you know you'll if you do electrical engineering you take signal processing you'll you'll learn about fourier transforms again something super useful it's at the basis of things like graph neural nets which is an entirely new sub area of you know ai machine learning deep learning which i think is super promising for all kinds of applications something very promising if you're more interested in applications is the applications of ai machine learning and deep learning to science or to science that can help solve big problems in the world like i have colleagues at uh at meta at fair we started this project called open catalyst and it's it's an open project collaborative and the idea is to use deep learning to help design new chemical compounds or materials that would facilitate the separation of hydrogen from oxygen if you can efficiently separate oxygen from hydrogen with electricity you you solve climate change it's as simple as that because you you cover you know some random desert with uh solar panels uh and you have have them work all day produce hydrogen and then you shoot the adrenaline wherever it's needed you don't need anything else uh you know you you have controllable power that's you know can be transported anywhere so if we if we have a large-scale efficient uh energy storage technology like producing hydrogen we solve climate change here's another way to solve climate change is uh figuring out how to make fusion work now the problem with fusion is that you make a super hot plasma and the plasma is unstable and you can't control it maybe with deep learning you can find controllers that will sterilize plasma and make you know practical fusion reactors i mean that's very speculative but you know it's worth trying because um you know the payoff is huge there's a group at google working on this led by john platt so control convert as many problems in science and physics and biology and chemistry into a into a learnable problem and see if a machine can learn it right i mean there's properties of uh you know complex materials that we don't understand from first principle for example right so you know if we could design uh new um you know new materials uh we could make more efficient batteries you know we could make maybe faster electronics we could i mean there's a lot of things we can imagine uh doing or you know lighter uh materials for for cars or airplanes and things like that maybe better fuel cells i mean there's all kinds of stuff we can imagine if we had good fuel cells hydrogen fuel cells uh we could use them to power airplanes and you know transportation wouldn't be uh or cars and we wouldn't have uh emission problem uh co2 emission problems for for uh air transportation anymore so there's a lot of those things i think where ai you know can be used it and this is not even talking about all the sort of medicine biology and and everything like that right you know like you know protein folding you know figuring out like how can you design your proteins that it sticks to another protein at a particular site because that's how you design drugs in the end um so you know diploma would be used for all of this and those are kind of you know would be sort of enormous progress if we could uh use it for that here's an example if you take this is like from recent material physics you take a monoatomic layer of graphene right so it's just carbon on a hexagonal mesh and you make this single single atom thick you put another one on top you twist them by some magic number of degrees three degrees or something it becomes super conductor nobody has any idea why uh i want to know how that was discovered but that's the kind of thing that machine learning can actually discover these well thanks maybe not but but there is uh a hint perhaps that with machine learning we could train a system to basically be a phenomenological model of some complex emerging phenomenon which you know superconductivity is one of those uh where you know think the the skeletal phenomenon is too difficult to describe from first principles with the current you know the usual sort of you know reductionist type method but we could have deep learning systems that predict the properties of a system from a description of it after being trained with sufficiently many samples this guy pascal fuad epfl he has a starter company that where he basically trained uh a convolutional net essentially to predict the aerodynamic properties of solids and you can generate as much data as you want by just running computational free dynamics right so you give a like a a wing airfoil or something shape of some kind and you run computational free dynamics you get as a result the drag and you know lyft and all that stuff right and you can you can generate lots of data train a neural net to make those predictions and now what you have is a differentiable model of let's say dragon and lift as a function of the shape of that solid and so you can do backward understand you can optimize the shape so you get the properties you want yeah that's incredible that's incredible and on top of all that probably you should read a little bit of literature and a little bit of history for inspiration and for wisdom because after all all these technologies will have to work in the human world yes and the human world is complicated yeah and this is um an amazing conversation i'm really honored that you talked with me today thank you for all the amazing work you're doing at fair at meta and thank you for being so passionate after all these years about everything that's going on you're you're a beacon of hope for the machine learning community and thank you so much for spending your valuable time with me today that was awesome thanks for having me on that was it was a pleasure thanks for listening to this conversation with younla kun to support this podcast please check out our sponsors in the description and now let me leave you with some words from isaac asimov your assumptions are your windows on the world scrub them off every once in a while or the light won't come in thank you for listening and hope to see you next time

Info

Channel: Lex Fridman

Views: 452,714

Rating: undefined out of 5

Keywords: agi, ai, ai podcast, artificial intelligence, artificial intelligence podcast, facebook, lex ai, lex fridman, lex jre, lex mit, lex podcast, machine learning, meta, mit ai, neural network, nyu, reinforcement learning, self supervised learning, yann lecun

Id: SGzMElJ11Cc

Channel Id: undefined

Length: 165min 10sec (9910 seconds)

Published: Sat Jan 22 2022