NOAM CHOMSKY - THE GHOST IN THE MACHINE

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

For reference, here's a thread with some details about what Chomsky means by "the principles of language," or his distinction between "impossible languages" and possible ones.

[edit]More: https://abutler.com/is-language-innate/

πŸ‘οΈŽ︎ 1 πŸ‘€οΈŽ︎ u/pocket_eggs πŸ“…οΈŽ︎ May 10 2023 πŸ—«︎ replies

This is a funny title, because chomsky has already previously stated, that in general, there is no machine anymore, only the ghost.

The meaning being that the mind/body dichotomy no longer exists in any meaningful sense (where ghost and the machine comes from) because the body was specifically defined by the mechanical philosophy, which has been falsified or at least shown to be a dead end by scientific progress. Leaving only the ghost behind.

πŸ‘οΈŽ︎ 1 πŸ‘€οΈŽ︎ u/MasterDefibrillator πŸ“…οΈŽ︎ May 12 2023 πŸ—«︎ replies
Captions
first we should ask the question whether large language models have achieved anything anything in this domain answer no they've achieved zero [Music] hello street talkers welcome back to the channel today is a special episode for me personally an emotional episode we've been through quite a tumultuous journey to to create this episode for reasons which will become abundantly clear pretty soon but noam chomsky is an intellectual heavyweight he's one of my personal heroes and it truly is a dream come true for us to have this experience of interviewing him so anyway i really hope you enjoy the show today we are coming live from lake como in italy i think this is possibly one of the most beautiful places on planet earth and you know in a douglas hofstadter style i hope that adds a little bit of flavor to the creativity behind this episode anyway enjoy thank you so much to our patreons i wanted to give a special shout out actually to our vip patreons alex mcnamara and ebonia elliot lewis now um psychologically if nothing else it helps us so much having your support and we're also interested in finding sponsors for the show and of course don't forget to join our amazing discord community but um no we we really appreciate our community and our support and thank you so much welcome back to street talk today is absolutely incredible our dream came true and we got to interview our hero professor noam chomsky walid couldn't even keep his together he had a jizz face for pretty much the entire interview chomsky is an intellectual heavyweight he's probably the most important intellectual of the 20th century uh he's been cited so many times i mean we're quite spoilt here on mlst because fristen and bengio also are in the top 100 of scientists based on their h score but it was an incredible honor to speak with chomsky to me i still have 10 chomsky books on my bookshelf downstairs i went through a certain political phase at university and uh no harm in that i think it was very enriching for me going forwards in my life but yeah he's been a towering figure i think in so many people's lives throughout the entire 20th century and beyond this is going to be a long show it's going to be over three and a half hours long so use the table of contents to skip around this first chapter is mostly about yan lacoon and the reason we wanted to talk about him and his recent paper is because as a radical empiricist he's the antithesis of chomsky and i think it frames some of the subsequent discussion nicely but if you want to skip ahead and you know just only do chomsky stuff then it's about 50 minutes to skip look at the table of contents so chapter one really the revolution that has happened over the last decades is the fact that we've realized that ai has to be intimately linked with learning uh the fact that when we observe uh you know any animal with a brain in nature is capable of learning um and perhaps because i'm i'm lazy or because i don't think i'm very smart i i i've always thought that it would be very difficult to just uh design uh from scratch uh an intelligent system an intelligent system has to basically uh design itself more or less through through learning so lacoon just released a position paper called a path towards autonomous machine intelligence now i admire lacoon a lot i want to give him credit personally for explaining things in very simple terms it's always very insightful reading his papers or listening to his lectures note the lack of jargon um even with the energy based models that the high-level abstract pictorial formalism is something which is great for communicating science but not so great if you don't want to get schmidt hubbard but we'll come back to that in a bit so um he really knows everything in his space he can essentialize complex ideas so yeah he's a wonderful scientist however let's not forget the i mean god knows he must be earning about five million dollars a year probably way more than that as a vice president of facebook i mean even at an e7 level he could probably um into the millions so anyway um you know i guess what i'm saying is he's not going to rock the boat is he he's gonna he's gonna stay uh comfortable they've already gone quite far in introducing the world to energy-based bayesian bayesian models so um yeah that that's the caveat anyway in his paper he lamented that our best systems are still far from matching human reliability and real-world tasks such as driving even after being fed with ridiculous amounts of supervisory data from human experts and after going through millions of reinforcement learning trials in virtual environments and even after engineers had hardwired hundreds of behaviors into the model so lacoon thinks the answer may lie in the ability of humans and many animals to learn world models which is to say internal predictive models of how the world around us works i thought the paper contained a lot of common sense when it comes to some of the pitfalls of the current approaches to agi and um you know in my personal opinion i think he's still a little bit too wedded to neural networks but as we're going to discuss in this program when it comes to symbolism and empiricism uh there's no middle way you're either all in or it's like a house of cards and and it all comes crashing down but anyway um he thinks that the main challenges of ai research today are being able to learn passively being able to learn efficiently introducing type 2 models yeah of a similar vein to our friend francois charlay and um also this ability to learn abstractions and compositional semantics and i guess we should lump those in together for the time being he did actually recognize that compositional semantics while important were um something that was you know it's still beyond our reach he doesn't really know how to achieve that with the current paradigm now um on the blank slate you know the prevailing wisdom from empiricists like the and rich sutton remember his bitter lake essay is that all handcrafted knowledge is bad basically and and you know perception derived knowledge is the only knowledge uh game in town now um one thing i found quite ironic in this paper is that it's a huge step away from that wisdom i mean of course they'll say that it isn't but when you look at this architecture diagram there was even a comedy version of it made by um christian sergei it does look like a rich handcrafted cognitive architecture um you know i don't think it's hyperbolic to say it's representing or resembling the architectures that we used to build in the 1980s of course lacoon would argue that it's actually a level of abstraction above that it's like a prototypical architecture with learning but in a way it's not because all of those you know in all of those levels levels of abstraction are actually hard-coded levels of abstraction now lacoone even showed a development graph of human cognitive skills strongly implied to be a directed acyclic graph which was derived only from perceptual information or nearly only now i thought that was a bit of a sleight of hand because he strongly suggests that the sequential development of skills is evidence of the blank slate in human cognitive development but then he went on to show that with these levels of abstraction in his hierarchical models they must be hard-coded this isn't to say that he thinks that neural networks today cannot learn deducible abstractions from the base abstraction prior or you know the the inductive pry which is to say the encoder and the prediction model you know um it did give me pause for thought because presumably he does think that current neural networks can learn abstractions i mean that's the common wisdom i mean in my opinion they can't well they they learn a tiny dot of abstractions out of an infinite sea of of abstractions but it gave me pause for thought and he also said that he thought the current language models were not artificial general intelligence because they don't have these abstract latent variables to explore multiple interpretations of a percept and indeed search for an optimal course of actions to achieve a goal so um it's not entirely clear to me whether he thinks that the lack of bayesian style uncertainty quantification or the lack of a hard coded abstraction hierarchy is the biggest reason why it's not agi but it's interesting to see him point out those two um you know particular issues that he sees with current architectures for lacoon we need to be able to dynamically switch between levels in an abstraction hierarchy the levels should strictly depend on the information derived from the level below only and the abstractions while handcrafted should learn empirically by predicting the representation of another concept in the same level of the hierarchy but pointing at another point in time or space this is what lacoon means by passive or self-supervised learning now um unfortunately we don't have time in this episode to go into all of the jeppa stuff but to be honest i'm glad we didn't waste our time because yannick has just made an episode on it so you can think of this as supplementing yannick's episode i do think it's quite fascinating though i mean i'll say a couple of things here lacoon introduces these latent variables of unnormalized energy to represent possible futures it's basically as far as i'm concerned it's analogous to bayesian style probabilistic graphical models i mean from a you know philosophy and mental framework point of view uh you know where you have these latent or unobserved uh variables so and in the context of lacoon's architecture we're learning the dependency between what is observed and what is not observed but now we use this unnormalized energy instead of pure probability distributions which means that the distributions don't have to sum up to one because doing so is usually an intractable operation um but it they are actually interchangeable so you can quite easily convert energy distributions into probability distribution you know obviously it's not it's not a perfect conversion it's still possible to use neural networks as prediction modules in these architectures and lacoon obviously would want to do that the main change here is that the models become stochastic over the domain of the latent variable which is to say there are many possible y's or predictions for a given x or for a given signal lacoon then regularizes the latent to stop information leaking into the predictions and also to shrink wrap the representation around volumes of high density um to i mean as lacoon says mostly to overcome the curse of dimensionality which you get with so-called contrastive methods i always wondered what lacoon meant by contrasted versus non-contrastive they're both contrast if they're both the same but the non-contrastive has some regularization tricks to prevent mode collapse and to prevent problems with the curse of dimensionality anyway we discuss self-supervised learning and contrasting models quite a lot in our interview with ishan mizra so why don't you go and check that out if you're interested and watch yannick's video as well anyway all of his models um being you know self-supervised con non-contrasting models um learn by filling in the missing gaps in time or space so they work mostly passively and his models are stacked vertically to capture concept or abstraction hierarchies and then stacked in time to learn these action space abstractions which is what he calls you know daniel kahneman style mode two or you know thinking fast and slow so he thinks that being able to find trajectories in action space is analogous to reasoning i don't think it's analogous to well i mean i guess it is technically analogous but it's at the wrong level of description and you need to have a an infinite number of traversals to actually reason so i i will make the argument later in the video that what we need is is symbolics and compositionality or what foda and violation called um systematicity okay um the paper also has some fascinating discussion of uncertainty quantification again we don't have time to get into it here but i do recommend you check out the paper i think it's a wonderful you know tour de force of energy based models and joint embedding prediction architectures and and jan lacoon's view on many topics in artificial general intelligence [Music] everything comes from observation from sensory input which is which it has been crushed a long time ago i'm even surprised i'm even surprised that the likes of jan lacombe still think that everything comes from observation i mean are you kidding i could be blind and deaf and be as rational as don chomsky and god we have another huge part of cognition that has nothing to do with perception okay the most remarkable quality of human cognition the very core of our cognition is the ability to take any two objects and select from an infinite set of possible abstractions abstractions which are not deducible from percepts why does lacoon think that the only abstractions which are needed are directly deducible from perceptual information all of this fails when you consider the fact that perception-derived data cannot deduce most rules about the world certainly not in limited time and space i'm talking about world models and abstractions and hierarchies surely they cannot just be probabilities now lecoon's architecture can produce a tiny sliver of abstractions a minimum spanning tree of which directly deduceable from the handcrafted priors of the encoders and the prediction models i agree that these are abstractions but they are essentially human crafted or at least human seeded the space of possible abstractions between two objects is infinite yes infinite lacoon said that objects may spontaneously emerge once the notion of object emerges in the representation concepts like object permanence may become easy to learn objects that disappear behind others due to parallax motion will invariably reappear so it was at this point that it occurred to me uh what lacoon really meant by abstractions and indeed i mean i can see how three-dimensional objectiveness might be deducible given the visual priors that we've designed on these models but this is just a drop in the ocean now lacoon admitted that it's these handcrafted priors which determines what is represented in the models and indeed which abstractions are deducible he said that the joint embedding prediction architecture finds a trade-off between the completeness and the predictability of the representations what is predictable and what does not get represented is determined implicitly by the architectures of the encoders and the predictors they determine an inductive bias that defines what information is predictable or not now lacun gives a concrete example of these different levels of description or abstraction if you like he says let's take a concrete example when driving a car given a proposed sequence of actions on the steering wheel and pedals over the next several seconds drivers can accurately predict the trajectory of their car over the same period the details of the trajectory over longer periods are harder to predict because they may depend on other cars or traffic lights or pedestrians and other external events that are somewhat unpredictable but the driver can still make accurate predictions at a higher level of abstraction ignoring the details of trajectories other cars and traffic signals etc now the car will probably arrive at its destination within a predictable time frame the detailed trajectory will be absent from this level of description but the approximate trajectory as drawn on a map is represented a discrete latent variable may be used to represent multiple alternative routes end quote so lacoon goes on to say that a model could in theory work at multiple levels of description or abstraction simultaneously just like humans do and he asserted that the ability to represent world states at several levels of abstraction is essential to intelligent behavior yan lacoon reasoning is simply finding a good path through a state action space this is a very low resolution view of a complex topic it's the equivalent of predicting the weather tomorrow using the average temperature of this month deep learning folks tend to lean into the parlor tricks and lean away from any mechanistic understanding lacun admits there is an exponential blow up traversing state action spaces in his hierarchical joint embedding architecture and suggests using a discrete approximate dynamic programming algorithm like monte carlo tree search to find good trajectories in tractable time but a much better way to cut down the search space is with compositionality compositionality is that the meaning of a complex expression is fully determined by its structure and the meanings of its constituents once we fix what the parts mean and how they're put together we have no more leeway regarding the meaning of the whole this is the principle of compositionality a fundamental presupposition of most contemporary work in semantics or the study of meaning we can understand a large perhaps infinitely large collection of complex expressions the first time we encounter them and if we understand some complex expressions we tend to understand others that can be obtained by recombining their constituents and guess what this doesn't just apply to expressions this also applies to planning and reasoning this is only possible with an algebraic approach to semantics and planning and achieved with symbolic manipulation for any two physical objects x and y if y is contained in x then if nothing exceptional happened to y the location of y must be the location of x now this is a symbolic rule a function a procedure you cannot represent this fact without symbolic logic this isn't data but rather a procedure which needs verification you have variables of a specific type where types come from an ontological structure and second you have quantification over these variables that's what the upside down a means it's addressing a potentially infinite set of possible values neural networks are extensional they cannot represent intentional which is to say infinite objects now if you want a refresher on this stuff check out our intro to the gary marcus and louis lamb show it's good to be in an environment where people take for granted these questions because i spent a lot of the last 20 years almost even more than 20 years trying to get people to recognize the importance of abstraction so i i came to this having worked in psychology on children learning rule and came into the first way or second wave depending on how you kind of of neural networks and and people trying to argue that there was no abstraction that it was all just basically memorization through multi-layer networks they were then three-layer network and it's been a long hard slog to get people to realize how important abstraction is and i i think that there's been a real sea change in the last couple of years the important thing to realize is the only way to represent infinite objects in a finite way is using quantification or logic over tight symbolic structures that's why neural networks cannot do basic arithmetic you need intentions that are symbolic procedures over variables let's look at another example addition here's a program add zero n equals n add m n equals one plus add m minus one n now it's a finite representation of an infinite object so addition of m to n is nothing more than adding m ones to n or m successors to n and from that you can define multiplication because multiplying m by m is adding m n's or adding n ms that's why it's commutative yeah the i mean compositionality which i i i'm very excited to hear terms like compositionality being discussed as of late hey i mean that that's music to my ears but there were huge results that were uh mathematically done that show that compositionality actually evolved for survival reasons even the their examples they give is usually language like as you speak you're interpreting and understanding what i'm saying practically nearly real part now let me let me put that into you and let me make you appreciate the complexity here you're taking a sequence of sounds in this case because i'm speaking it could be written i'm taking the sequence of words and you're almost real time building a mental picture of what that i'm saying the thought that i'm trying to convey right if we didn't have compositionality you know it would every so if in the absence of compositionality you know what you have to do every time you have to go and grab a sequence a drop sequence and make a meaning for it then make a meaning for the whole thing if you didn't have rules for subparts already built in and you just say oh they just said a phrase they just said appraise i know how to build a meaning for that one and then i do three four operations in the three and i'm done if every time you have to try all the parts that should fit together you wouldn't understand me real time we couldn't have communicated and here's the genius of richard montague who mathematically showed i mean people don't appreciate the work that these guys did you know it's all uh it's all a large language model now but look look at what richard montague did he said i can have john uh likes to play guitar i can have the boy next door likes to play guitar i can have my uncle's nephew who lives in australia likes to play guitar you get my point montague said how could all these things john heed the guy next door how could they all have the same semantic type in the end because they fit in the same slot the genius of montego is to devise an algebra that no matter what you have here it will reduce when you do all the typing to e and entity that's compositional semantics there's been a lot of people uh who who've been sort of saying there is a limitation to deep learning let's say or machine learning more generally because it's obvious that those things basically do curve fitting what's our definition of reasoning uh what is the the process by which we elaborate models and is there a qualitative difference between a models that merely performs curve feeding as we normally know it and a model that has a let's say to to adopt a terminology that others have proposed that models that establish sort of a causal model of the the data you're observing uh which can be the basis for for reasoning and and things like that right um and the answer to this is probably no there is a difference of course but is it an essential qualitative difference i'm not entirely sure and then there is the the argument if there is a qualitative difference which i'm not sure about um would this qualitative difference be in the form of uh fundamentally different things from deep learning you know things that are you know like discrete symbolic reasoning or things of that type and to that my answer is clearly no i do not believe that's the case okay we were going to publish the show today and look what you again schmidt huber has just dropped on his blog he said that yan lacoon's 2022 paper on autonomous machine intelligence rehashes but does not cite essential work of his lab from 1990 to 2015. now you again schmidt uber has established a bit of a reputation for constantly saying that all of the current ideas that are being published in deep learning today were already done previously in his lab and he feels resentful that he hasn't had proper attribution he says he's not without uh conflict here and it might seem self-interested you know correcting the record like this but the truth of the matter is that he says yes it is self-interested much of the closely related work pointed to below was done in his lab and he wishes to be acknowledged and recognized he's basically resentful that he didn't get the turing award along with the other godfathers to be honest i personally do think of schmidhuber as one of the godfathers so he quotes lacoon many ideas described in this paper almost all of them have been formulated by many authors in various contexts in various forms and then schmidt uber says yes in fact unfortunately much of the paper reads like deja vu of the papers from his lab going all the way back to 1990 without any citations so i'm scrolling down here mentions of controllers world models planning and roll out indeed this was covered in schmidt hoover's papers um he's famously argued that gans for example adversarial learning is a specialized specialization of one of his earlier models which he released from his lab lacoon's idea about learning to act by observation so lacoon has this hierarchy of data streams with increasing amounts of agency and being a radical empiricist lacoon thinks that we learn everything we know about the world largely by observation and mostly by not interacting with the world around us schmidt huber says that the recurrent predictive world model which may be good at predicting some things but uncertain about others so this is this thing at the top here so schmidt hubba thinks that he's already been there and done that this idea of the hierarchical percepts as well he says in 1991 the neural sequence chunker which is to say the neural history compressor used unsupervised learning and predictive coding in a deep hierarchy of recurrent neural networks so he thinks he's done that as well most interesting for me he commented on the symbolic component of jan's paper so do we need symbols for reasoning and he said that he had previously argued of the importance of incorporating inductive biases into neural networks that enable them to efficiently learn about symbols now based on the violation paper i think that's an oxymoron but he said that many neural networks suffer from a binding problem which affects their ability to dynamically and flexibly combined which is to say bind information that is distributed throughout the neural network as is required to effectively form represent and relate symbol-like entities he said he released a 2020 position paper which offers a conceptual framework for addressing this problem and provides an in-depth analysis of the challenges and requirements and corresponding inductive biases required for symbolic manipulation to emerge naturally in neural networks i must admit i've not read that paper i'm interested to check it out now so i'm just reading the abstract of that paper now on the binding problem in artificial neural networks um it was uh primary author klaus gref from the google brain team uh also with schmidt huber in 2020 so it said that contemporary neural networks fall short of human level generalization which allows them to extend far beyond their experiences and they put it down to this binding problem which they say affects the capacity to acquire a compositional understanding of the world in terms of symbol like entities like objects by the way this is exactly what we're talking about in this show which they say is crucial for generalizing in predictable and systematic ways to address this issue they propose a unifying framework that resolves around forming meaningful entities from unstructured sensory inputs maintaining the separation of information at representational level so um yeah i guess we'll get back to you on that whether it's whether it's any good lacoon said that the centerpiece of the paper is the joint embedding prediction architecture jeppa and the main advantage of japa is that it performs predictions in representation space assuring the need to predict every detail of why he says in 1997 a quarter of a century ago he built a general adversarial reinforcement learning machine that could ignore many or all of these details and ask arbitrary abstract questions with computable answers in representation space and he also noted that our even or his even earlier less general approach to artificial curiosity since 1991 naturally direct the world model towards representing predictable details in the environment schmidt uber said given his comments above he doesn't see any significant novelty there he's not claiming that everything is solved but he said in the last 32 years we've already made substantial progress along the lines proposed by lacoon in his paper lacoon said below is an attempt to connect the present proposal with relevant prior work and schmidhuber said he cited a few somewhat related things while ignoring most of the directly relevant original work as mentioned above possibly encouraged by an award that he and his colleagues shared for inventions of other researchers whom they did not cite he said that the point is that these ideas are not as new as they may be understood by reading lacoon's paper there's a lot of prior work that is directly along the lines proposed in his lab we've not had a great experience with him we tried to invite him on the podcast and he did seem initially interested and then he suddenly demanded a fee for coming on and then i did actually offer him five thousand dollars just for 60 minutes of his time and that wasn't enough apparently so the main thing i don't like about this is is the tone of it frankly i think it's really easy just to go back in time and say oh i i invented something which is conceptually similar to this because many things are conceptually similar we've just made a whole show on the infinitude of abstraction space i will hand it to schmidhuber that they are very similar but the architecture and approach that lacoon is presenting here is different it's using uh modern methods it's not using rnns for example yes it's using the same abstract ideas in fact what i like about lacun is that he presents all of his work in an abstract way the pictorial formalism of energy-based models is very abstract that's what makes it understandable the way he describes contrastive and self-supervised learning is very abstract he's talking about predicting unobserved information from observed information and he uses a language which is very accessible to lots of people so i can understand why other researchers would look at it and say oh that's basically the same as what i've done but that's because he's talking in the abstract if you look at his physical models they are different in my opinion lacoon has been a huge advocate of passive or so-called self-supervised learning supervisioning sucks i mean it's very limited in the sense that you can train machines to do very specific tasks and because they're trying to reverse very specific tasks they're going to use all the biases that are in the data to uh to do that task and if you try to get um outside of that tags they're not going to perform very well that's a limitation of supervised learning it has absolutely nothing to do with deep learning and this has been a complaint of mine for a while about um you know let's say the dominant paradigms of neural networks is from my perspective is a bayesian okay they've always been essentially maximum likelihood estimators it's like okay i train my neural network to take in a whole bunch of inputs and to give me the one true value you know which is really just the maximum kind of likelihood computation of of my inputs and you know as a bayesian it's like i mean guys that's just an estimate you know like the real truth is it's a distribution like there are many possible you know output values that this input should have given me it's sort of maybe it's 75 percent this value and and a little bit less percent that or it's a continuum you know if it's a if it's a density function or whatever so of course it is and like this is the nature of reality is that there's uncertainty and and you know when you're trying to build models if you want them to be generalizable like they have to they have to unders they have to encode in some way this type of this uncertainty right otherwise you're always just using the the mle or you know whatever some other kind of statistical projection of that of that distribution you're throwing out information you're losing a lot of information so i totally agree that um that uh neural networks or that you know the paradigm needs to evolve to take much more account of of uncertainty i think where i kind of object to it is why aren't we just using the word probability okay like because if we don't embrace so let's suppose you agree okay we need to start allowing for multiple possibilities great now you need a mathematics to deal with multiple possibilities we already have a mathematics to deal with multiple possibilities it's called probability theory or better yet conditional probability theory we've got that mathematics it's we have hundreds of years of you know development and theory behind that let's let's use it because if you don't use it if you don't just admit that what you're doing is probability theory then you're rolling your own probability theory and just like the fuzzy logic people that kind of did the same kind of thing you're gonna wind up with all kinds of inconsistencies and problems or whatever because for better or worse we only have one mathematically rigorous and consistent theory of uncertainty and it is probability theory that's just what it is it's conditional probability theory yeah but devil's advocate i mean lacoon has been challenged on this before i mean this is with his energy-based models where the the punch line is that rather than have the normalized probability which means it sums up to one you store these these exponential energies and lacoon says well i don't care about calibration which means i don't ever want to compare my models with other with other models and also i only ever want to make decisions and he also says that if you if you ever have a normalized probability distribution in high dimensions the model is probably wrong anyway and he gives the example of some density that has a manifold of zero width where if you sample that density you'll never ever get any samples on the manifold anyway so he said what you want to do is regularize it as he does with his energy-based models and that's what you end up with anyway yeah i mean there's a lot of things to to say about that i mean and as you can imagine like having the fact that probability theory is centuries old like you know these these questions have been addressed already like in the in the like let's say bayesian literature or uh probability theory literature so all this stuff has already been talked about and addressed for a long time so first of all if if a probability is is meaningless and high dimensions so is its logarithm which is the the energy function right i mean you don't you don't gain anything by ignoring the problem like you can ignore it but then you wind up with with inconsistencies as far as as not needing to normalize the distribution well if you want to if you ever want to add two probabilities together you better normalize your distributions and in fact he does do that like it was like yeah i need to do this kind of trajectory sampling i need to do some normalization here with like give sampling or whatever to actually do the normalization so i can add up different trajectories and you know calculate some means and things like that and as far as not doing model comparison well so here's here's something that's important to understand is that from a bayesian perspective any of this regularization stuff that machine learning people try to do all the time that's model comparison you know if you're what you're trying to do is say i'm comparing a model a and a model b where model a has hopefully fewer parameters or it's simpler in some you know some way okay which one should i prefer in light of the evidence like how much can i regularly regularize it and still fit the data well that's model comparison at least that's what a bayesian would call you know model comparison and we have a mathematics for how to do that like you have to conduct these integrals and i get it that's hard okay like totally understand believe me i've been there done that tried that it's really hard okay you wind up with these multi-dimensional intervals they're really hard but just ignoring it and then forgetting about it and going back to maximum likelihood estimation like you know you're not going to make any advances there where i see cool advances made is when people embrace it they say okay i got to do this integral here can't it's really intractable but here's some approximations to it okay but i but i know what i'm trying to do i'm trying to approximate this specific integral and i can come up with really nice approximations under these set of criteria it gives you a whole theoretical foundation right to advance the approximations rather than just giving up and going over into energy land and doing arbitrary hacks and approximations for which you have no theory really like that's why we end up with all these things in machine learning like this kind of batch norm that batch norm why does batch norm even work people arguing about it should we do it should we not do they have no theoretical foundation it's just hacking right yeah well it's quite ironic because um uh even jan lacoon himself you know he he wants to have end-to-end gradient-based systems but he even admits that on his joint embedding prediction architecture you know there's a there's a hierarchical one for for stacking the concepts and there's also um what he calls a mode two daniel kahneman style one which is where you have this discrete action space and you learn abstractions of actions over time and of course he needs to add these probabilities together to to generate trajectories through this action space so um he what does he do he uses the um the the gibbs uh sampling conversion and it's the same thing all along you know because we've got this discrete action space he can't use a gradient-based method to to do the um the traversal search so he has to use monte carlo to research so and and obviously he's also an empiricist and a blank slate guy and um he's he's creating this very very complex and and highly specific cognitive architecture which reminded me of the systems in the 1980s so you know it it's uh not quite as puritanical as as you would think yeah i know well one thing i really respect about mcconnell is that he's a pragmatist right and so at the end of the day like he really cares about things that work and you know and i have to think a large part of what he's doing here is um trying to get the the let's say the orthodox machine learning community to move in a certain direction right to improve these systems but but he's got to do so kind of gently has to guide them you know maybe one step at a time and avoiding trigger words like you know let's not say probability because like maybe that triggers somebody or or no bayesian because that'll trigger somebody i don't know i mean it's so it's gonna they're moving in the right right direction let me just say that it's just that uh it's maybe things are gonna be a bit slower than they need to be because we're not embracing a lot of the theoretical results that that already exist and instead of tackling some of the really hard problems we're sort of like trying to avoid them temporarily by you know but it's okay like i mean eventually the truth will out right like people will will get this um yeah so what was that fractions analogy for energy based models well i was just i was just thinking you know this idea that hey we're working with probabilities we can just ignore the normalization right and just work with the energies is almost like somebody coming along and saying you know whenever i'm doing arithmetic like adding fractions it's such a pain because i have to have the same denominator every time i add the fractions together you know and when the denominators get to be these really big integers i have to do this massive multi-digit multiplication you know forget that like i'm just going to add the numerators and just ignore the denominators right it's like okay you know you can do that but you're doing a completely different function you know it's like the the median of two fractions is you add the numerators together and you add the denominators together right and i think it has a name like uh you know freshman edition or something because it's a common mistake to do right i mean that's that's what it seems like to me it's like let's ignore all the hard work and just just do kind of a hack and it just doesn't work like it introduces problems what do you think lacoon would say though to that well i think he he says i mean he you know in the paper itself like there are many scenarios where you can't ignore the normalization and like he kind of admits that and and uses it in certain places but but that's just bayesian bayesian probability at that point it's like bayesians don't have any problem you know understanding that there is a normalization constant sometimes you can work without it sometimes you can't and so they'll defer normalization until necessary but the problem is there's so many scenarios where normalization is required so many useful scenarios i want to comment on on something interesting you brought up there which is um you know this like the the desire to have differentiable um learning methods i totally get that desire i wish i wish we had differentiable methods to find like program search in general okay like where i where i run into a problem with it is that um he talks a lot about world models right and he says that that the big problem that's going to be facing ai in the future is how do we build these world models like how do we how do we represent them how do we build them how do we learn them well here's something to think about here which occurred to me when i was reading that paper is hey a lot of the world that we're operating in right now that we want to model and understand for better or worse consists of symbolic computational systems like pretty much every single program in existence today is a symbolic piece of code running in a you know von neumann machine some kind of like you know finite kind of turing machine kind of thing it's running these symbolic systems like they're all over the place okay and whether we like it or not people's cognition at a high level is pretty much symbolic it may be implemented at the lowest level by by sub-symbolic nano things or whatever but but we operate in this kind of symbolic thing we're surrounded by symbolic machines okay and programs and software well your world model is going to need to be able to model those things and i'm really skeptical that i mean if if we're trying to model symbolic systems are we really hopeful that we can model them well enough with non-symbolic differentiable systems you know i'm pretty skeptical i think we have to just grab the bull by the horns as they would say just embrace the fact that we've got to figure out how to do search over these discrete spaces i don't know how to i don't know how to do it nobody knows how to do it but we got to figure that out you know whether it's some evolutionary algorithms or whatever the case is if we can crack the nut of learning how to do searches over the space of all possible programs outside of the space that's differential differentiably accessible that's you know we're gonna make like huge project there with progress there right whether it's dreamcoder or something else you know i don't know neat you know neuroevolutionary topologies whatever it is we need to put much more effort into learning how to search that space that's not differentiably accessible i agree and i think it's easy just to say that like there are other challenges as well i mean lacuna said in the paper as well that just something as simple as being able to take a goal and break it down into intermediate sub goals at different levels of description we just take stuff like that for granted yeah and it's as simple with you know like the the problem in cognitive science of these categories just being able to draw circles around things at different levels of description and traverse between them we take that for granted so professor yan the recently released an article called what ai can tell us about intelligence can deep learning systems learn to manipulate symbols the answers might change our understanding of how intelligence works and what makes humans unique now this is pretty much in direct response to some of the hype in the ai community he does have a huge stab actually not only at gary marcus which is about symbols but there are many other kind of um fronts being fought in the world of ai at the moment but yeah i mean now's the time that there's so much in the deep learning and at the moment we almost need to create a bingo card and what have we got on that card well some people think scaling is all you need well yanderkoon agrees that that's some people think that reward is enough well yan lacoon also thinks that that's some people think that ai systems today are slightly conscious yan le thinks that that's some people think that ai systems understand us yan lacoon thinks that's as well some people think that deep learning can do symbolics well yet yamba thinks that it can gary marcus thinks that it can't some people think that data is all you need some people think that emergence is all you need um anyway i mean i was really impressed when i read this article from jan because in a way i was i was really happy that he was calling on so many of these ai hypsters you know stuff that i perceive to be but i think it's very unfair towards gary marcus because unlike open ai that's been spouting all of this utter nonsense gary marcus just has a different perspective there are so many different perspectives of artificial general intelligence and gary marcus has the same perspective as noam chomsky which is that psychology has a lot to say about artificial general intelligence and yes maybe he has a philosophical agenda maybe he even has a monetary agenda because if the focus changed to his view of artificial general intelligence he could create a startup company he can become as successful as jan lacoon has but i think it's very unfair to criticize gary marcus in this way because gary marcus still has his credibility intact now on to chapter two this is the emergent abilities of large language models so there's an interesting paper just out called emergent abilities of large language models by jason way etau now they say scaling up language models has been shown to predictively improve performance and sample efficiency on a wide range of downstream tasks this paper they say discusses the unpredictable phenomena which they refer to as the emergent ability of large language models and they consider an ability to be emerging if it's not present in smaller models but is present in larger models thus emergent abilities cannot be predicted simply by extrapolating the performance of smaller models now um i think i don't really like their definition i think a better definition would be a transient change in phenomena right so it's it's not the it's not the relationship between small models and large models because if there was a kind of continuous improvement in perplexity no one would call that emergent so yeah i think my definition of emergence and we've been thinking a lot about emergence because we just did a show on it is an unexpected and transient change in macroscopic phenomena that's the best definition that i can come up with personally now in this paper they seem to have the in my view incorrect impression that emergence is the same thing as extrapolation similar to that grocking paper a while ago all of the long-term misquote this all the time they say oh there's this sudden snap point where the perplexity goes down the validation accuracy goes up and suddenly the model can extrapolate and i don't think that's really true it doesn't pass the sanity test to me i mean it's just a quirk of optimization that the model suddenly fits the data set better that's not the same thing as extrapolation right extrapolation means that i can i can fit functions outside of the training range but the problem is it's quite easy to measure with things like arithmetic but it's quite difficult to measure with some of the um you know language tasks like for example on big bench because many of these things don't exist in the training data and it's very difficult to measure them so the whole concept of extrapolation becomes very vague with some of these language models and assuming it's not extrapolation why is it so surprising and why is it so interesting that we have a sudden jump in perplexity i just don't get it it's i don't think it's that big of a deal honestly definitely interesting definitely requires further examination but um i think this is the main argument which is used by these folks that talk about scaling laws these emergentists who think that if only we could train for longer or if only we could train on more data or on bigger models then suddenly we'll get this snap and we'll get this recursive self-improving artificial general intelligence that just doesn't make any sense to me i genuinely don't believe that chapter 3 on empiricism chomsky argued that the way we actually acquire the faculty of language and therefore its relationship to experience and indeed the physical world radically different from the empiricist tradition chomsky became well known for his famous 1950s critique of behavioral psychology now behavioral psychologists tended to believe that humans were an unadulterated blank slate made of putty which was then molded and shaped by our environment you know through a process of empirical stimulus and response they believed that simple reinforcement learning was how we modeled the world around us indeed how we learn language now chomsky argued that this could not possibly explain how virtually all human beings regardless of their intelligence do something as miraculous as master the faculty of language even when they're not deliberately taught it as most people probably aren't and they also do this at such an extraordinarily young age and in such an extraordinarily short space of time he argued that for this to happen at all we must be genetically pre-programmed to do it and therefore all human languages must have in common a basic structure which corresponds to this pre-programming as brian mcgee pointed out in his 1970s bbc interview of chomsky this also has some very negative implications chief of which is that anything which can't be accommodated to the structure any piece of the cosmic jigsaw puzzle which can't connect with other pieces are linguistically inexpressible and unintelligible to us so the general principles common to all languages set vital limits to our capacity to understand the world and communicate with each other but chomsky would argue that this is the very basis for our creative capacity to understand an infinite space of abstractions and also understand anything which has been expressed by another human being centrally your whole approach represents a rejection of the empirical tradition in in philosophy it doesn't because i mean the very fact that you think that uh the empiricists our role about how we learn uh must mean that they're wrong about knowledge and uh the nature of knowledge and the nature of knowledge has been the central problem in the whole empirical tradition of philosophy well uh the classical empiricist tradition which i think was the tradition that's represented let's say perhaps in its highest form by hume uh seems to me to be a a a tradition of extreme importance when we investigate it i think we discover that it's just completely false that is that the mechanisms that he discussed are not the mechanisms by which the mind reaches states of knowledge that the states of knowledge attained are radically different than the kinds that he discussed for example for hume the mind was in his image a kind of a theater in which ideas paraded across the stage and we it therefore followed necessarily that we could introspect completely into the contents of our mind if an idea is not on the stage it's not in the mind uh and the ideas may be connected and associated well that's a that's a theory and in fact it's a theory that has had a an enormous grip on the imagination throughout most of the history of western thought for example that that same image dominates the rationalist tradition as well uh where it was assumed that one could exhaust the contents of the mind by careful attention chomsky was inspired by continental rationalism which refers to a set of views more or less shared by a number of philosophers who are active on the european continent during the latter two thirds of the 17th century and the beginning of the 18th century rationalism basically defines your view of knowledge as a rationalist you consider the primacy of reason and intuition over sensation and perceptual experience you would tend to think that most ideas rules and knowledge are innate and just like our friend dr walid sabha you would eschew any type of uncertainty quantification you would prefer to deal with absolute black and white certainty anything else as far as you are concerned wouldn't be knowledge so on the on the discord community i i made the statement that knowledge cannot be entirely derivable from perceptual information and and they just they were having none of it actually most abstract mathematical knowledge right is is far faster than human experience right so there's this kind of dualism it's what i call the mind experience dualism so now given the apparatus we have we can we can kind of interconvert between uh perceptual experience and mathematically abstract experience if you take all of our perceptual uh cognitive uh powers i think if you want to draw a venn diagram it's a dot in a huge right the abstract word is much much it's all experiential knowledge is is almost a tiny dot in that ocean and you know what they're doing now in ai they're concentrating on the damn dot no no learning from instances is brutal guys so when you want to defend it you have to tell me how uh hold on you have to tell me look learning from data either you're adopted and defended or you don't there's no middle right if you want to defend learning from observations a template like this you have to convince me how a child learns 200 million sentences from few examples if you don't have an explanation for that the rest is hand waving well waleed sabha thinks that empiricism is a huge house of cards as soon as you allow any symbolic manipulation the whole thing just falls down flat on its face so according to walid saber um if you have to be an empiricist you have to be an empiricist all the way and let me uh make an example out of this you take a guy that lives in bangladesh you take a guy that lives in amsterdam hall perception wise observation wise empirical data wise they're apart so much that they live practically on different planets but what they know is the same knowledge that is not empirically obtained is the same and that's all they need to survive their observation their individual four-year-old knows that if i say i'll have a greek statue in every room in my house they know i'm not talking about a greek statue what i mean is in every room in my house i have a greek study and because they know this is knowledge not from observation that a physical object cannot be in more than one location but that requires symbolic manipulation devil's advocate though let's imagine i'm playing a computer game and you can have the same statue in two rooms at the same time so let's say it's a blue statue and and i'm observing lots of episodes of this game and i very quickly learn that you can have this statue simultaneously in two different rooms but not both in the same room i could then create a cognitive program and i've just empirically learned that program no i i challenge you to have to to even to even think it look at tim i mean even in the world of star trek where they you know transportation can be i mean we can just decompose all our molecules and transport them and reassemble them and all that stuff even in that world of sci-fi there are things that the mind cannot even accept right these are this is how the universe works so i challenge you to imagine the two blue statues in two different rooms as one you can't even think it i think some of this comes down to these you know modern matters of definition right and and black and white lines so i mean traditionally a uh a dimension on which empiricists and rationalists differ is the degree to which they admit the existence of innate knowledge or an eight and eight concepts and so obviously on on one extreme is is zero and then that would be like an ultra empiricist who says there's no such thing as any kind of innate you know knowledge like literally uh let's see if we're talking the unit of analysis is the human an individual human mind that they start as some type of literal zero blank slate and then through just empirical observation alone um learn right and then on on kind of the other extreme you know i don't think rationalists have ever denied that there that some knowledge is empirical like i don't i don't know that they've ever denied that but perhaps the extreme version there is that they they say there's a kind of superiority to deduced rationally derived knowledge versus empirical knowledge so like in a way you know like the the uh platonic ideals are in a sense more reliable and and more superior and and and you know more ultimate or purer than than any type of empirical knowledge i mean i think any of us probably should know that the answer is somewhere in between here and where the dividing line is between okay if you go beyond this you know you you become a a rationalist instead of an empiricist i'm not sure like i don't know where they they draw the kind of battle lines today but i think if we if we take say look as an empiricist he he admits that there is prior knowledge you know you need these inductive priors to be quite useful after all one of his huge achievements was cnn i.e structurally encoding a certain kind of of prior this translation and variance into neural networks but he thinks it should be as close to zero as possible that it should be so minimal that it's like just enough to kind of uh jump start or bootstrap a learning system and then from then on it can kind of just learn everything on its own through observation of of data and i think people on the other side of the camp and i include my myself there pretty much and i think wally too is that we believe that the there's just not enough data not enough data not enough computational resources not enough time uh for that to make sense that sure there are a lot of concepts a lot of knowledge that you can learn by by observation and then and then you know reasoning kind of on top of that um and by the way empiricists now no longer deny that that reasoning happens right but they view reasoning as just generating connections between facts that were learned empirically not as bringing facts to the table themselves but really only as mechanisms connecting and deriving from from facts um i just think that the that it's not realistic it's not pragmatic you know in the same way that sure a neural network of infinite size like axi exercise you know the axi folks can do everything well that's not what i'm interested in i'm interested in pragmatic systems and i think that there's types of knowledge that are derived from um rational sources you know these deductive logical kinds of sources that uh you can't hope to get to pragmatically from empiricism alone and what fascinates me is okay what's the mechanism then by which that knowledge entered into into our brains like that to me is is the fascinating question yeah i mean i want to get to that in a second because i know you've got some very interesting ideas about um fristen's free energy principle actually and it relating to our existence but i don't think lacuna or even jeff hawkins you remember jeff hawkins i mean he um cited vernon mount castle you know that famous neuroscientist from the 1980s uh this idea that the the neocortex has all of these um little units that are exactly the same they're just wired differently but that's still a prior so he said that we had um you know some of our neurons were fed from our sensory motor circuits and then there were other neurons that were kind of like concept neurons but if you think about it the way the brain is structured is still a prior so any conception that we have you know you were just saying before that empiricists think that there's a stage and all of the ideas we have are derived from things which are on the stage and if you look at the brain it's a stage right all of that sensory motor signaling it's a stage so i don't think they would really deny that the structure of that stage defines a lot of our conception yeah i i can't put words in their mouth but i i don't think they deny that i i would i would hope to think that that they don't i mean because because it's obvious that there that there is structure there and i think and as i said i think lacoon has agreed that you know sure there is some initial structure that you that you need in order to get you know bootstrapped if you will um i think uh you know maybe if if we talk about in terms of say something simpler which is just logic gates like if you just imagine that your brain was consisted only of nand you know one of the logic gates that can create any any logical operation so whatever you just have trillions of you know nand gates and if you connected them up in a potentially fully connected way so every single nand gate connected to every other possible mandate and then and then the goal of learning is to turn off and on some of those connections um such that you wind up with something useful um i think the question is just how much do you have to start with like you know maybe if you just start with a random knockout so i just randomly assign a bunch of a bunch of connections you know can i learn you know can i then learn from that through some type of back propagation or some other you know weird because now we're in like a digital space you know some type of ea algorithm you know can it can it learn things or do you need to have like some additional kind of structure on there because if you look at the human brain it's way way way far away from that extreme of a fully connected you know nan circuitry like it's very sparse it has you know structure in these kind of cortical columns like hawkins talks about you know in a very certain kind of vertical and also horizontally parallelized way you know there's tons and tons of structure there like it's in the kind of distance of all possible networks it's a very long way from a fully connected tabular with just random connections right so i think that i think that they have to admit that there's there's structure there i think it's just that they want it to be as lightweight and and as you know the constraints to be as loose as possible because then that allows as much flexibility to that that learning algorithm to to learn things that are most optimized and most appropriate for any particular task yeah yeah i agree with that and when walid was talking about cognitive templates earlier his main rationale is he doesn't want to be surprised by reality so he says that you know what we call someone who um makes poor predictions about the world we call them crazy that's what wally would say but i don't entirely agree with that because i know that i could i put this to him i could have a computer simulation and in this simulation the physics are different the reality is different so i could quickly come up with a cognitive template in my mind that in this computer world you can have two blue objects simultaneously existing in two different rooms and i would very quickly learn that cognitive template and i could reason over it so i have just empirically come up with a new cognitive template right i mean i agree because you don't even have to imagine simulations for for very strange and i think this is kind of what chomsky gets at when he says that we have this innate radical empiricism right is we have this this false belief that concepts that we learn at the at the scale of pool balls and you know apples falling from trees and things like that just we feel strongly that they must also apply to electrons and protons and atoms and you know whatever else and they just don't i mean like that's like even take a simple concept of you know oh two objects cannot exist at the same place at the same time well that's true if they're fermions you know if they're bosons like a photon for example you can have as many photons as you want in the same place at the same at the same time right so i mean physics alone and just like the world in general or take a you know mixing colors if you mix pigments together um you know red and green pigment for example is going to going to wind up with kind of like a brownish you know muck or whatever but red and green light produce yellow so i mean you get all kinds of almost antithetical or opposite you know behaviors of things that happen all throughout the natural world and i don't see people as having any trouble really learning like a new set of rules if you will i mean i play a lot of video games or used to when i back before we started doing the show and i had time to play video games but you know for example you run into a video game where like it has a time travel element it doesn't take long to learn how to work in this area where you can move in a dimension that corresponds to to time or you learn kind of different sets of world rules like people somehow don't really have trouble necessarily but there are areas where we're really driven to be these radical empiricists and certain concepts that we hold is almost inviolable you know like yeah i mean but i mean this this is what i wanted to get to as well that i mean in walid's refutation of empiricism he says that anything that can be experienced or observed will be known but we spoke with kenneth stanley and he says well you know we experience consciousness and we don't know it uh if we have a hallucinogenic experience or a dream do we know it well actually actually we um we reason to the best explanation so we try and hang it on a structure which is already in our brain so the really interesting bit is whether we can take percepts and whether we can build an abstract structure and that does largely depend on the structure we already have but this to me makes me think well whether it's innate already is irrelevant because we know that we can create new structure in response either to thinking and reasoning or in response to new perceptual information of modifying these capacities what we might do however is games i mean at least it's in theory imaginable that we might discover something about the limits of our science forming abilities we might discover for example that some kinds of questions simply fall beyond the area where we are capable of constructing explanatory theories and i think we even maybe now have some glimmerings of insight into where this delineation might be between intelligible theories that fall within our comprehension and uh areas where no such theory is possible yeah so i i agree but i do i do think there are limits and and i think this is this is one of these real mysteries so in some areas people are very flexible and others you know we're not and i think what was interesting talking to chomsky was you know when we asked um is it possible that there are really just these these limits to human cognition you know this horizon beyond which we may never go and he brought up the example of uh rats and prime number mazes you know that if you have a prime number maze so this is a maze where at every prank intersection you take a right for example and if you do that then you get to the cheese or you escape the maze or whatever the goal is that no amount of training no amount of time can you ever ever train a rat to complete a uh a prime number maze because their brain their cognitive structure there just doesn't have the concept of prime numbers it it's it's just totally lacking it's like if it was just excised from their capability to understand the concept of prime numbers and he said it would be a miracle if we human beings don't have similar limitations so maybe there are concepts there that are necessary to understand all of physics to understand the physical universe to just they just don't exist in any human mind and it may even be the case that we can't even formulate them in any externalized intelligence form we may just have this almost this blind spot to this particular concept and i think that's fascinating you know i think there's in my mind there's really two possibilities here one is chomsky's correct that it would be a miracle and in fact there are blind spots that human cognition doesn't have on the other hand i think it's also possible that there may be um a level of of cognition at which like let's say it's higher order logic like once you get to the ability to understand higher order logic it may be it may be possible that all facts of the universe can be can be described and somehow and and you know higher order logic i don't know the answer to it but i think it's a fascinating question well let me push back on on that a little bit because yeah the fascinating thing is when you reach potentially when you reach a threshold of intelligence everything might be accessible because there is this infinity of abstractions out there in abstraction space most of which aren't particularly useful and most of which will always be unexplored but with the rat example yeah they can't reach the abstraction of a prime number and apparently they were not trainable but if you think about it shortcuts exist because abstraction space is a topological space and you could create a breadcrumb trail and in theory you could train the rats to learn a sequence of mechanical steps just like a neural network does to effectively um perform the maze of a prime number mage without understanding i wonder whether that's possible yeah i mean i think it's i think it's always possible that in any finite scenario so a maze that has at most you know in you know intersections i think it's always possible in any finite scenario and we talked about this some kind of in the context of neural networks too sure you can you can memorize like so i mean i don't know i haven't personally looked at the rat you know maze literature but but i imagine you could probably train a rat to like maybe always for a particular maze to take like the second and the third and the fifth turn or something but i think probably the point is that that there's a very small number of turns at which it it's limited like it can't go beyond that and it can't generalize to kind of i say arbitrary in you know i mean obviously uh it takes a certain amount of time to escape and it'll die before it can go through one million turns or whatever that number is but but this is the distinction that we always try to make about algorithms and concepts that generalize such that they can just be re they can they can operate on any almost arbitrary you know time or number of operations as long as you can extend like the memory space so that the code the algorithm is completely separate from from really the the memory space if you will and so like if i was in a prime number maze and you just stuck me in the middle and said hey you welcome to prime number maze see you when you get out well i could get out you know for a really large number of of turns right because i can i can count you know three five might take me a little while but eventually i can get out there and and you know if you have a machine that has an expandable memory it could get it could continue that for an arbitrary number of of cycles and escape it wouldn't need to memorize it right that's right but then it then it comes down to the machine of our brain or a rat's brain and i'm interested in the dichotomy between being able to mechanistically perform what is an abstraction and as you say if the machine could take enough steps in this abstraction space then they could perform the prime number calculation without understanding it and this brings me on to godel because um walid brought this up as well so godel's proof is all about abstraction it's the ability to prove a theorem finitely right so a proof is an abstraction and you know like wally talks about this quantification logic right that's the ability to use a finite object to address an infinite set so um this this goedel's proof is actually i mean waleed said it's about a space of things that cannot be known yeah i mean it's always it's always treacherous territory when you when you get into girdles you know proof and or you know what not because uh um everybody thinks everybody thinks everybody else doesn't understand it and you know certain philosophers think that they have the one true kind of understanding so i don't like i don't want to present what i'm trying to say here as you know the truth like you know you guys have to kind of decide for yourself right but but the idea is that in any in any particular finite formal system so we kind of set down here's the set of rules that we follow and there's like a finite list of those set of rules and then we start with a set of you always got to start with somewhere we've been talking about the sloth conversation you start with a set of you know axioms or whatever so it's and it's a closed system you know new axioms don't enter the picture there later on um there will always be statements that you can write down in that in that formal system that are true they're true statements but cannot be proved you know within that within that formal system so the idea is that you just you know you can't prove everything either either you're incomplete which which is what this is meaning there are statements that are true which i can't prove so i'm not complete my formal system isn't isn't complete enough to prove every every true statement okay or you're inconsistent so you can you can have a a set of rules in which it's possible for example to prove that a statement is both true and false um but then of course you can prove that any true statement is true so it's you got to kind of pick your poison here you're either incomplete or inconsistent but it's important to keep in mind that this applies to closed formal systems and technically and mathematically you know human beings and human cognition are not closed like this is in fact what science does right it poses a question okay i don't know if this if the statement is true or false i don't know and i can't prove it but i can go do an experiment and then the universe tells me like whether or not it's true or false right that's the fascinating thing about human beings embedded in this universe is that we have this process called science which is an open-ended it's an open system you know i want to say open-ended because that gets us a totally different like like question but it's an open system that can interact with the universe and new stuff comes in right so we don't really we're not strictly bound by girdle's theorem although any particular finite system that we write down is but we're always expanding that by conducting science and doing experiment that huge apparatus of reasoning has nothing to do with priesthood end of story we don't sense transitivity we don't say i mean that that is a ridiculous paradigm and it has been proven by the way skinner has been skinned long time ago behaviorism but at least by reputable people i mean it's a joke you still think up uh behaviors all i need to know i learned in kindergarten i i don't need to know how to ride a bicycle that's where learning happened i learn you learn the skill you don't learn knowledge you acquire now you just go and grab it right i learned how to play guitars but the universe doesn't give a damn if i play guitars or not because the fall term for both is learn right but really we don't learn that stuff we acquire it it's knowledge acquisition i go and i acquire i grab it i steal right learning is different regarding empiricism there's this issue of what's your unit of analysis so are you talking about an individual human mind because for an individual individual human mind it's abundantly immediately experimentally obvious that we're not empiricists like we have we have encoded innate knowledge for an individual human being but i think like fristen would expand the level analysis to to the species and life in general which is that life this process of life has been evolving you know and developing this set of dynamics that survived um and so whether or not you call that empiricism you know it's kind of a definitional matter what i'm what i'm more concerned about is what's the mechanism by which by which that knowledge becomes encoded in in the human circuitry or the circuitry of life and and where i what i think we're missing or not talking about enough is that yes observation and you know interaction with the environment do a little bit better than appear or whatnot definitely plays a role but just just the mere fact of survival seems to play a role in imparting that um you know and so it'll it'll be interesting to see how over time we learn more and more about how to apply survival which we do and say ea algorithms right like in ea algorithms you have a population some may survive to reproduce some don't versus versus you know observation and reasoning and that type of dynamic you know learning that you do and then the other the other point that you brought up there is once you have that like once you have this knowledge however it gets there okay so it's from your genetic endowment it's from the laws of nature or whatnot the human cognitive system has learned to take that those seeds those seeds of knowledge and generate with them all kinds of mathematics like all kinds of abstract ideas that have no correspondence to anything that we know of in reality and that's pretty much the bootstrap right like like like it's it's kind of weird that there is this platonic world of ideas that are all you know can be consistent and interconnected and you can say all kinds of useful and crazy well i see useful you can say all kinds of you know consistent and interesting things about them and yet they have no there's nothing in the physical world that corresponds to them but they're but they're no less you know um they're no less sensible like they're no less they're no less mathematical they're no less you know they just don't have a correspondence to reality and likewise by the way on the other side of the coin there's clearly a lot about the universe that's happening that we have no sufficient mathematics to describe so there's a vast chunk maybe maybe you know maybe the piece that's that's even mappable to the physical world is measure zero like in the vast chunk of mathematics so there's this vast chunk that doesn't map to anything in reality and there are things happening in the physical universe for which we have no no mapping to mathematics for so there are almost these these two separate worlds that slightly overlap here in this little sliver of um of math that we would call physics right chapter four cognitive templates in fact while it's true that our genetic program rigidly constrains us i think the more important point is that the existence of that rich of that rigid constraint is what provides the basis for our freedom and creativity and uh the reason why you mean it's only because we're pre-programmed that we can do all the things we can do exactly the point is that if we really were plastic organisms without an extensive pre-programming then the state that our mind achieves would in fact be a reflection of the environment which means it would be extraordinarily impoverished fortunately for us we're rigidly pre-programmed with extremely rich systems that are part of our biological endowment correspondingly a small amount of real rather degenerate experience uh allows a kind of a great leap into a rich cognitive system essentially uniform in a community and in fact roughly uniforms which would have developed over countless evolutionary ages through other basicological evolutionary processes the basic system itself developed over long periods of evolutionary development we don't know how really but for the individual it's present as a result the individual is capable of with a very small amount of evidence of constructing an extremely rich system which allows him to act in the free and creative fashion which in fact is normal for humans we can say anything that we want over an infinite range uh other people will understand us though they've heard nothing like that before we're able to do that precisely because of that rigid programming but short of that we would not be able to at all what account are you able to give of creativity if we are pre-programmed in the way you say then how is creativity and possibility for us well here i think one has to be fairly careful uh i think we can say a good deal about about the nature of the system that is acquired the state of knowledge that is attained we can say a fair amount about the biological the basis the the basis in the initial state of the mind for the acquisition of this system we are only respecting the universe we live in it it's no more than uh how the planets orbit each other and they they just i mean they're obeying the laws of physics we're we're obeying the mental laws of the metaphysics they call them metaphysics actually arthology is all about metaphysics what how the word functions that has not been obtained by going out to the park and dying three times until i discovered that thing like that's stupid we come equipped with that look even animals have that by the way a calf after two minutes starts walking and eating on its own if if you look at newborn and like obviously it has to do with evolution and how those that didn't obey the laws of nature didn't survive yeah so um quite often people confuse or conflate empiricism with nativism so you know nativists think that we have all of the cognitive apparatus already built into our mind but actually they're not synonymous i think what chomsky says is via some ethereal mechanism we are endowed with the laws of nature and that's how all of it you know osmosis gets into our brain so you've got a really interesting view on this keith well yeah because i've been thinking about this you know chomsky said um look the evidence is absolutely clear that that humans do not start off as a blank slate we have these endowments of of prior knowledge one of them is obviously genetics you know we have a genetic endowment we have this code in our dna that that results in structures that unfold and grow and sure they they develop in response to to the environment but they're growing from this this encoded template in our genes but he says there's another possible source of of this knowledge outside of experience which is the laws of nature and and it's kind of a mystery how how the laws of nature enter into enter into our our knowledge and and i know chomsky would agree you know there's some mechanism there we just don't know what it is and i've been thinking about that you know like what is what is the mechanism and and my guess is it really comes down to something very simple but profound at the same time which is survival okay like if you think about um at the end of the day life has evolved to create this circuitry okay and it doesn't matter whatever the circuitry is made out of you know neurons it could be uh also in lower you know monocellular organisms it could be in in the in the dynamic pathways that they're you know the concentrations of of electrolytes or whatever is going on inside of them right but there's some circuitry okay one thing that enters into that circuitry is does it survive to see another day and so long before there were any any organisms that did anything that we would call observe you know to sit there and observe nature and think about it okay they were developing this circuitry because as they randomly explored the space of possible circuits some circuits processed information in such a way that gave them an advantage in the real environment i.e under the laws of nature and they survived and ones that didn't have processing that corresponded to the laws of nature simply were destroyed right and so the mere fact of survival of existence or continuing existence i think encodes knowledge into into life so it's almost a form of ontake knowledge you know it's knowledge that's there because of your factual existence and it had to correspond to some degree to nature and the laws of nature or you would have perished right and i think there's this interesting connection to fristen's free energy principle because it's all about hey look if we take the assumption that things exist that that definable things objects definable things exist and for them to continue to exist if they continue to exist what must they do right that's the question he asks is what must a definable entity something that has this boundary you know what he frames as a markov boundary what must it dynamically continue to do in order to exist to continue existing and and you get this free energy principle which is that well to you know whether you want to call it thinking reasoning it doesn't matter it has a dynamics it has a dynamic behavior that mathematically corresponds to something like bayesian inference right like it has to it has to model predictions of the future and its interaction with the environment in order for it to continue existing or it would have been destroyed and so for me this is a fascinating idea that that you have this endowment one comes from genetics one comes from the evolution of life in general which is of course encoded genetically in us but it comes from the evolution of life in general the fact that it survived it's an existential form of knowledge but the interesting thing i mean first of all fristen is an empiricist and it's quite easy to look over the history of our evolution to explain how do these cognitive functions get implanted into our brain and i think that that's a that's that's a great explanation because then using those cognitive functions we can start to um extrapolate into this much larger abstractive um space but what you're saying is interesting because there is a kind of um epistemic resonance between the cognitive templates we have and the reality we live in for this precise reason but i still think though that it's possible for us to learn new cognitive templates because what walid says you'll never be surprised by reality for that reason because the cognitive templates that we've been endowed by nature are the templates that describe the universe yeah i agree and in this case that's what we're saying it it seems almost immediately obvious that we have we have the ability to to learn you know to learn or to apply definitely to learn new cognitive templates i mean we definitely do it you know humans are entering new new realms in which we didn't evolve all the time and it's certainly happening in the virtual space and you know and we're certainly able to do that the question is are there limits to that and yeah i mean there's obviously some very trivial you know kinds of limits to it but but i think there may be these deeper limits too um it's kind of this this question chomsky raised of you know are we are we rats in a prime number maze you know are we missing certain concepts about the universe that we just may never be able to get i think it's an open question um there's certainly as he puts it there's certainly this innate drive to be radically empiricist to try and take our human level understandings of you know balls and apples and and sticks and things like that and project it down to to uh the level of quantum mechanics or far out into the stars you know at the at the like galactic super cluster level um so we definitely have some failings there uh and it'll be interest you know it's it's going to be a question that we'll be asking probably forever uh is what are the limits of human cognition and i don't know chapter five the ghost in the machine some of you may have noticed that when we first released this video it got blocked by the bbc on copyright we had included a small clip from richard feynman from a bbc horizon interview in 1980 and back in those days the bbc actually made quality content remarkably so we also included that clip from brian mcgee which was in the 1970s luckily they don't have a copyright block on that yeah anyway let's not go there i'm absolutely seething about it i believe this information belongs in the public domain i'm so annoyed with the bbc for blocking that anyway this is what richard feynman said i'm just going to quote it i have a friend who's an artist and he sometimes has taken a view which i don't agree with very well he'll hold up a flower and he'll say look how beautiful it is and i'll agree and he says you see i as an artist can see how beautiful this is but you as a scientist you take this all apart and just becomes this dull thing and i think he's kind of nutty i mean beauty that he sees is available to other people and to me too i believe and although i might not be as refined aesthetically as he is that i can still appreciate the beauty of a flower at the same time i see much more about the flower than he sees i could imagine the cells in there the complicated actions which would also have a beauty i mean it's not just the beauty of the dimension of one centimeter there's also the beauty at a smaller dimension the inner structure also the processes the fact that the colors and the flower evolved in order to attract insects to pollinate it is interesting it means that the insects can see the color it adds the question is this aesthetic sense does it also exist in a lower form does it why is it aesthetic all kinds of interesting questions which the science only adds to the excitement a mystery in awe of a flower it only adds i don't understand how it subtracts end quote the ghost in the machine is british philosopher gilbert ryle's derogatory description for renee descartes mind body dualism now descartes as a man of scientific genius could not but endorse the claims of mechanics yet as a religious and moral man he could not accept like hobbs did the discouraging rider to these claims namely that human nature differs only in the degree of complexity from clockwork descartes and subsequent philosophers naturally but erroneously believed that they availed themselves of the following escape fruit since mental words are not to be construed as signifying the occurrence of mechanical processes since the mechanical laws explain movements in space other laws must explain some of the non-spatial workings of the mind which is to say the ghost in the machine i'll talk some about uh isaac newton and his contributions to a study of mine that he's not known for that but i think a case can be made that uh he did make substantial uh indirect but nevertheless substantial contributions i'd like to explain why there is a familiar view that the early scientific revolution beginning and through the 17th century provided humans with limitless explanatory power newton's greatest achievement was that while he seemed to draw the veil from some of the mysteries of nature he showed at the same time the imperfections of the mechanical philosophy and thereby restored nature's ultimate secrets to that obscurity in which they ever did and ever will remain the mechanical philosophy of course was the guiding doctrine of the scientific revolution it held that the world is a machine a grander version of the kind of automata that stimulated the imagination of the thinkers of the time but much in the way programmed computers do today they were thinking of the remarkable clocks the artifacts constructed by skilled artisans and there is a further task that's to determine the scope and limits of human understanding incidentally some differently structured organisms some martians say that might regard human mysteries as simple problems and might wonder that we can't find the answers or even ask the right questions just as we wonder about the inability of rats just to run prime number mazes it's not because of limits of memory or other superficial constraints but because of the very design of our cognitive nature and their cognitive nature so actually if you think it's through i think it's quite clear that newton's remarkable achievements led to a significant lowering of the expectations of science a severe restriction on the role of intelligibility they furthermore demonstrated that it's an error to ridicule what's called the ghost in the machine that's what i and others were taught at your age in the best graduate schools harvard in my case but that's just a mistake newton did not exercise the ghost rather he exercised the machine he left the ghost completely intact and by so doing he inadvertently set the study of mind on a quite a new course this is eric curiel from harvard university well the world is a complex place and our mathematical models of its parts are almost childishly recklessly simple how can a relation of representation hold between them this issue of complexity in the world is a very serious problem for the standard view of representation there's um the second problem is what i call levels of abstraction in any given theory or or framework within which formulates theories there are many given levels of abstraction at which one can write down the mathematical formula that one in standard parlance uses to represent physical theories in newtonian mechanics we have f equals m a that's about as general abstract as one can possibly get does that represent in the same way as the expression for the newtonian force law f equals g m m r r hat over r squared can f equals m a represent at all is there anything in the world that is a pure acceleration even in a world in which plutonium mechanics would be true putting aside it is that the fact that it's not in fact true can the latter the force the newtonian force law the gravitational force law can that represent in the same way as the equation modeling two perfect homogeneous spheres is the keplerian binaries binary system without a specified target as an undergraduate when you're when you're learning you joining gravitational theory and you write down the step layering binary system and solve for it does it represent something i have no idea can that perfectly idealize keplerian binary system represent in the same way as a set of equations modeling the earth and sun as a concrete individual gravitationally coupled system with lunar and jovian perturbations accounted for do different levels of abstraction represent in the same way how does one decide when the mathematics is concrete enough to represent i don't know and nothing in the standard views gives me any clue as to how to answer that question really the the key point that chomsky has made a few times and he made it with us when we were talking to him um about what newton did you know how newton changed physics like forever right is is to show us that the universe is not intelligible to us like at least in the sense that we can't take our kind of common sense uh you know mechanical at the scale which people operate where things are have this kind of mechanical properties right machines gears things touch each other in order to to have a force they have to have contact right between objects and things like that that there are scales or regimes of physics where it's just it's not the way it is like our intuition doesn't apply to that and that was a problem with the action at a distance like you know gravity right you know i mean that it's exerting this force that behaves as if it was pointing at the instantaneous location of that object and acting over you know vast distances instantaneously right and it doesn't help that sure um years later okay we we now have you know a better theory like gr right but it introduces all kinds of things that are not intelligible to uh to humans like curved space-time i mean what's how's that correspond to anything we perceive in reality like at our at our level of cognition it doesn't right and so it's just not intelligible it's almost you have the math and it's almost like the the kind of school of quantum mechanics it says just shut up and calculate forget about trying to make it intelligible just just use the tools and the math and yeah exactly well that's why so um you know when chomsky says that newton exercised the machine but left the ghost intact he was saying that we no longer seek mechanical explanations the machine was exercised from our theory of science but the ghost is still there right so after newton the problem was largely forgotten yeah and i think he might have made some points about why it was forgotten i mean i don't think it's fair to say it was forgotten because he points out that newton struggled with this action at a distance you know for the whole first whole rest of his life right like like really just not willing to accept it that that uh that there was a deep philosophical problem there and you look things like russell right with russell's paradox where where he just it just crushed his vision of kind of mathematics even is resting on a solid foundation but i think it's totally fair to say the ghost was left yeah like we exercised the ghost and left the or exercised the machine and left left the ghost there i i kind of agree with that you know there's so much mystery left yeah it i mean it's interesting i mean um because now we try to build models of natural phenomena that are intelligible to us but you know so the model is intelligible even if the underlying phenomenon is not intelligible and i guess chomsky would say that all modern science is like that basically whether it's quantum physics or biology or even linguistics and and physics so i've recently i've been watching some interesting um videos and lectures of uh eric curiel from the uh the black hole institute and he has there there's a really cool lecture he gave is basically the point of it was that mathematics does not represent and he has some very solid arguments for how we shouldn't even think of our mathematics as corresponding to what's actually happening in in physical reality nor even that they represent what's happening faithfully even in that abstracted form you know the he has a solid argument to say look what it really is is it's a bridge mathematics is a bridge between two almost equally mysterious things one is the actual underlying physical reality which is we've been talking about is like not intelligible it's weird you know it does all kinds of things that are not describable really or or within the realm of our cognition and it's a bridge between that and something equally mysterious which are these abstractions and concepts that somehow exist up in our head that we can't directly look at you know we we perceive them and we think about them but it's not like we can really you know um understand where they come from like how do we how does abduction work right how do we how do we come up with these these concepts and these generalizations and this creative act and what are they really and there are these things that are always almost outside the boundary of our cognition and mathematics is just a bridge between those two things um yeah well this is this is really fascinating because it links back to that rat in the maze example that that you gave because if we shouldn't expect the study of nature to be reducible right to to models which are intelligible um it does make you wonder whether it's possible at all right to understand the world we live in with models why i think on the one hand an answer is it clearly is it's clearly we're clearly able to understand the world to an extremely pragmatically useful degree i mean because we have all this technology that we've built we've we've you know come to understand concepts at a scale so small that it's it's hard to believe anything's happening down there and so large that it'll forever be out of reach of anywhere that human beings can ever can ever get to physically right so in a strange way our mind is able to span and understand this vast reach of stuff happening and yet there's still infinitely more that's mysterious and and perhaps you know forever out of reach of our cognition and that's just it's really fascinating to me at least and beautiful in a way yeah and and these are the things that chomsky spoke about a lot in the interview with us so there's this notion of closedness which is that science thrives on reductionism so by separating one phenomenon or one effect from the rest of the world we gain the ability to model it to understand it and to reinsert it into the broader picture so you know things like physics experiments to theoretical computer sciences simplifications but you know this whole thing about mechanical philosophy that originated with galileo didn't it you know which is this idea that we can view the world as a machine and galileo insisted that theories are intelligible only if we can duplicate you know what they do by means of artificial devices which i think is fascinating yeah and and this kind of pokes a bunch of holes in that and in in a way i'm kind of hopeful that it frees us up to be even more creative with mathematics and science and it kind of it kind of gets at some of the aspects that you know you and and kind of stanley talk about pretty often too right which is you know we should be free right to experiment and have serendipity and creativity utilize those aspects of of of human cognition when understanding even fundamental physics um and you know this this analytic idea right of kind of splitting up things and and drilling down on the one component that does something actually has a lot of negative effects to say in medicine for example like you know we're always looking for the single molecule that will stave off disease or or cure an illness right whereas what we're trying what we're finding out now is that sometimes you need cocktails you need mixtures of multiple molecules you can't just distill it down to the one true essence of something you have to take a more holistic approach to medicine to health and maybe even things like physics and mathematics so i'm kind of hopeful actually that if people accept this mortal it'll actually expand what we're able to understand um rather than reduce it exactly i just wanted to to close this bit talking about descartes as well which is another one of chomsky's heroes and he recognized the creative aspect of language and thought right which is this ability unique to humans that can't possibly be duplicated by machines so he said that you know language was innovative without bounds appropriate to the circumstances but not caused by them and can engender thoughts in others which they recognize that they could have expressed themselves so and this is actually a creative principle of the mind he called res cogitans i remember chomsky arose that which stood alongside res extensor you know which is this cartesian dualism you know the two substances and it's this idea that um you know descartes actually thought that there was a kind of separation between our body and the um the infinite set of expressions which could be created in the mind yeah and i think um was it a galileo that he said considered you know the alphabet is the greatest the greatest invention was it because it's with this finite number of symbols you can express this infinity of of uh of concepts you know and and there's so many real mysteries here like where um the one you brought up with what thompson called descartes problem you know how is it that did um how is it that we can have ideas and and linguistic expressions that are not directly caused by by the inputs that we're getting it's not like a an input stimulus response right it's actually something else is going on in there and we're generating a new a new linguistic expression that's maybe never even been uttered before and yet it somehow maps correctly in a sense to what's happening in the world to the to the situation and people are able to understand it you know and how is that kind of freedom of the will if you will even possible because the best of our our science you know points certainly at one one interpretation of things that it's all deterministic um even if it's very non-linear and chaotic it's still deterministic or if it's random it's random in ways that certainly don't you know provide any type of this um freedom of will and you know he had this this hilarious kind of uh paraphrasing i guess of william james where he said you know if if you believe there's no freedom of the will why bother presenting an argument you know you're being forced to do it the person you're trying to convince can't be convinced because they don't have free will either so why why bother doing it at all i know and chomsky did say actually that this very human ability we have to select an action um you know given the given the circumstances using free will is is one of the biggest mysteries in in science but uh just to wrap up that i mean um yeah so chomsky kind of took us on an intellectual journey which is to say that in the olden days we used to think of the world as a machine you know it had a kind of mechanism and um and now we don't think of it that way anymore but what we do do is we construct intelligible theories around the world so you know we can't know what the world is but we can build theories um that a machine can compute so the world is not a machine our mind is a machine yeah i mean he said he did he did say he said that the goal changed from from the world being intelligible to we'll just we'll be satisfied with theories that are that are intelligible i'm not quite sure if if if he even believes that the the theories are are always intelligible because because sometimes you know they're almost on the on the boundary of of not intelligible um but i know well um i mean remember jeff hawkins said that um einstein's general relativity was actually quite intelligible because it uses a lot of everyday concepts that humans would understand but the thing is like einstein didn't actually explain um anything right all he did was he came up with this abstraction which was a model which was arguably intelligible but no one really understood it yeah and and and i think this is something that um that i was talking about eric curiel and that and that talk about mathematics doesn't represent he brings up the fact that even if you just stick with theories okay take something like general relativity there are multiple radically different formulations of general relativity okay like they they have very different sort of uh concepts atomic concepts built into the foundation and the you know i mentioned eric curiel earlier because in in that talk of his that mathematics doesn't represent from my perspective at least he brings up the even if you focus on the theories just the theories themselves not the world there's questions as to whether those theories are even intelligible themselves because if you take something like general relativity for example you know there are radically different formulations of it like there's the metric formulation the tetrad formulation there's three plus one dimension formulation there's chiral formulation there's four dimensional formulations and these things have radically different you know elements that that make them up so even even purely in the theory and in the mathematics you can have very different structures that that all map to the same underlying underlying reality so how you know how can they all be simultaneously intelligible when they have such radically different you know structures this is richard feynman again i'm quoting my father had taught me looking at a bird he says do you know what that bird is it's a brown throated thrush but in portuguese it's a haunted in italian it's a chute and in chinese it's uh he says now they know all the languages you want to know what the name of the bird is and when you finish with all that he says you'll know absolutely nothing whatever about the bird you'll only know about humans in different places and what they call the bird now he says let's look at the bird and what it is he told me how to notice things and one day when i was playing with what we call an express wagon which is little wagon which has a railing around it for children to play with so they can pull it around it also has a ball in it and i remember this it had a ball in it and i pulled the wagon and i noticed something about the way the ball moved so i went to my father and i said hey pop i notice when i pull the wagon the ball rolls to the back of the wagon it rushes to the back of the wagon and when i'm pulling along i suddenly stopped the ball rose to the front of the wagon and i said why is that and he says nobody knows he said the general principle is that when things are moving they try to keep moving and when things are standing still they tend to keep standing still unless you push on them really hard and he says the tendency is called inertia but nobody knows why that's true now that is a deep understanding he knew the difference between knowing the name of something and knowing something feynman said something can only ever be explained by taking something else for granted and at some point you need to stop this infinite regression and just admit that you need to take something as true on faith or simply admit that you cannot know chapter six uh this is a discussion of the foda volition uh paper in the 1980s on connectionism where they put this massive critique forwards which we think hasn't been answered yet um and their main argument centered around productivity and systematicity so here's a fact trains of thought are often like arguments in particular they often lead from entertaining true premises to entertaining true conclusions this fact engenders a problem the problem is suppose that the mind is a mechanism and what after all what else could it be since its states have causal powers supposing it is a mechanism it would be nice to know how a mechanism a piece of physical matter could have this property that minds have how the state transitions of a mechanism could be like arguments in the way that the state transitions of minds are like arguments this is the problem that cognitive science has made some progress on in my view it's in fact the problem that defines the field turing's idea was that you can explain the analogy between trains of thought and argument consonant with assuming that the mind is a mechanism if you also assume the following two things first that mental states are syntactically structured and second that the syntactic structure of a mental state determines its causal role in mental processes are sometimes refer to this as the language of thought picture of the mind because the most obvious example of things that have syntactic structure is sentences in fact it wouldn't be uh very misleading to say that turing's idea was that thinking is a syntactic operation on mental sentences and what turing argued i think persuasively enough so that one wants to follow out the research program that he set forward what turing argued was that if you assume that thinking is a syntactic operation on mental sentences then the nature of the analogy between trains of trains of thought and arguments the truth preserving character trains of thought can be made compatible with the mechanistic theory of the mind i mean at least all the connection inside talked about with this bite above that as they say no the idealization to an infinite capacity is not allowable actually there's only a finite number of thoughts you could think even if you lived forever and even if memories and straight surreal and attention constraints and stuff like that are are are relaxed even under those conditions you could be in the position of having thought all the things you can do and running out of thoughts okay so that's a way of biting the book i think the views that the mind is is inherently finite is a bizarre view well i'll call a systematicity argument the systematicity arguments are supposed to do that they're supposed to be arguments in the touring model that don't require idealization the infinite capacitance i'm going to try to set out that class of arguments i'll argue first that thought isn't just productive it's also as i say systematic and that the best explanation of systematicity presupposes a combinatorial syntax and semantics for mental representation just like the best explanation with productivity does so um i'll argue that it can't be the case that all mental representations are atomic so it can't be the case that the mind is a connectionist network that's the form of the argument foda and phylician wrote a seminal critique of connectionism in the late 1980s they released a paper called connectionism and cognitive architecture a critical analysis and in walid's opinion this critique of connectionism has still been unanswered in the paper they deride the term sub-symbolic it was a term that was invented in the 1980s by connectionists to describe how a representation can be sliced and diced and stored over many nodes in the neural network but fodor and violation do not think that sub-symbolic confers the kind of cognitive architecture which they think is possible with the classicist architecture there are systematic interrelations among the thoughts a thinker can entertain for example if you can entertain the thought that john loves mary then you can also entertain the thought that mary loves john so systematicity looks like a crucial property of human thought and thus demands a principled explanation what i have to do is to tell you what systematism is and why it has this implication and here my strategy will be to start with natural language that is i'll show you uh what systematicity is and what the arguments for it are in the case of natural language and then show you why having defined the notion for you mental representation thought must be systematic too okay so i'll take natural language english and start a paradigm case of the system which is systematic chomsky and fodor also spoke a lot about productivity which is basically this idea that you can generate an infinite number of meanings from language in natural languages and this is just the way making the productivity point that i was making before in natural languages they're always an infinite number of semantically distinct syntactic forms right you can say one plus one is two and two plus two is formed and john believes it's raining mary said john believes it's raining bill thought murray said john believes it's raining it's surprising that bill thought that john said americans so that's i mean natural language is always productive in that way jerry foda and zenon phylician in 1988 wrote a paper called connectionism and cognitive architecture a critical analysis and the funny thing was even back in 1987 these guys these classical ai guys they thought that the hype of neural networks and connectionism was ridiculously getting out of control and they wrote this paper so uh keith can you just give us a summary of the paper oh boy a summary of the paper first of all the what i'll tell you about that paper is it's it's just chock full of goodness so i recommend um you know anyone go and read it it's very uh contains a huge wealth of of thought and argumentation and examples and even though it was written you know back then in 87 88 whenever it was you'll find that it's it's as applicable today as as it was you know back then um and and they come you know really the the crux of their argument is that um that there is fundamental differences between symbolic systems and connectionist kind of architectures right and they try to they try to really drill down on what those fundamental differences are and the fundamental differences that practically matter for like the capabilities um of these systems okay and really it kind of it hinges on uh you know i guess three main you know pillars although there's much more in the paper than just the discussion of this but the one is that that uh symbolic systems are productive productive systems by which they mean you can like take like say a formula for example that has variables and you can substitute in you know structures and you can produce you can generate you know more and more structures from this simple set of rules so think about like a context free grammar for example um you know you've got this kind of set of rules and from that you can generate this this infinite set of of uh sentences if you will that still follow a certain structure and every single sentence that you generate from that set of rules will uh will be consistent with that set of rules it'll it'll have a certain structure you know defined by that and in fact you can even be given a sentence and then parse it back to what was like the tree that generated it and so they had this productive nature um and this is by the way this is like kind of important to understand here and we talked about this a lot too this is the nature of of computation is that it's unbounded in time so you can always sit there and kind of iterate and like if you think of the turing machine you know it has like this tape that it can go and write down some symbols and it can go back and expand one and expand it more and you can kind of keep going for an unbounded amount of time you know producing producing a larger and larger you know productive result if you will everybody understands that in reality every machine that we're going to build is is finite okay we get that okay but there's a big difference between architecturally there's a big difference between finite state machines and machines that have this potentially infinite unbounded memory that they can kind of operate on so the idea that they're saying about these systems are productive okay and secondly they're also compositional which is almost the the opposite of this okay it's saying that you know you maintain the parts of a hole so even though you've you've created a structure which is say the sentence you still have kind of the words in there as these separate entities and you can go in and and pull them apart you can analytically reach in grab pieces of them and look at like say a phrase that's within that sentence so if you think about like say um uh you know boolean satisfiability like logical formulas you know here's like a three sat formula a bunch of terms and they're anded together and then or together and you're asking you know you're kind of analyzing whether or not you can assign variables to satisfy this structure like this is very kind of nand gate type type calculation logic you know suppose you have a part of a of a program that determines for this particular task this formula is important here like this formula if i if i operate on all the input variables in this way and i and i get a value out of that okay it's it's useful for something like determining whether or not it's a hot dog or a human face or whatnot and another part of the network that figures out you know or program if you will that figures out well there's a different another formula that's useful for some other task okay in symbolic world you've actually got the formulas and you still have their parts and their pieces and so you can have like a meta analysis that takes a look at those individual formulas and compares them and goes oh look these two things have the following terms in common so now maybe those terms are important for some purpose whereas in a connection network you know typically what happens and there's some caveats here that we can talk about but typically what happens is you've got a a neuron a node that's performing this calculation it collapses it all into a single output it's like i either fire or i don't fire with a certain value from that point on there's no more parts okay like it doesn't have a part it just has a signal 0.3 if you can't go in there and figure out okay well that point three came from these particular terms for this input right there's no way to to get that back out again it's been convolved it's been collapsed it's been added up and the problem is you can say well yeah but you know some other nodes in the neural network for example in a connections network can have those other terms right but the the difficulty is then you wind up with this exponential blow up because how do you know in advance like which sub terms actually matter like okay then let's just do all of them let's just have a neuron for every single possible combination of the of the um input and it'll fire and then you know some subsequent layer can go and decide well i care about these these subsets that's where you get this exponential blow up problem from is you can't defer that you don't have parts that you can go back and piece apart and compare and analyze later when you figure out there were something you have to have baked in ahead of time all possible kinds of combinations okay has this got something to do with intention versus extension so as i understand the classical ai folks they want to maintain the intention and they don't want to materialize the output immediately so that in the future let's say i've done some processing i can now go back in time and decompose and recompose the um the computation yeah i think so i mean and i'm not sure if this is naive of me or not i don't know but i generally think of intention you know with the s as uh formulas you know they're the way i think about them concretely is if it's a function from all possible worlds to worlds that that are true or worlds that are possibly true worlds right so an intention is saying look you have this this infinite you know space of possible worlds and a function on that that gives you a subset is an intention and you never actually need to materialize it as an actual um you know as an actual extension that gives you that that set of true worlds because you have a procedure you have an algorithm you have a formula which can determine for any given you know possible world doesn't meet that that criteria like that to me is the difference between intention and extension which is that the intention is a formula that could give you that you could iterate or apply you know to arbitrarily many you know worlds to generate um to generate the extension you know of that intention so the intention is like the generating function right for for a set of extension or for an extension but but critically the building blocks are there so if you have the intention like for example i use the example of the discrete fourier transform so you can represent represent that symbolically and now you can change all of the components you can change n for example yeah because as soon as you materialize it you don't understand it anymore and you can't generalize it into slightly different circumstances right yeah exactly and this is what you know and this is what the natural like let's say the classic natural language understanding folks like wally would would talk about all day long which is that suppose you have a grammar okay it's it's pretty easy to write down a grammar you know i'm not saying it's easier to write down grammar for natural language i'm just saying it's easy to write down a grammar like let's say a context free grammar or something and there's an infinite extension to that that grammar there's infinitely many you know countably infinitely many um sentences that that could be generated by that grammar and it's quite easy to build a machine that you can give it any one of those any one of those sentences okay and it can churn on it for a while for some you know finite period of time and come back and tell you whether or not that sentence comes from this grammar right it can decide whether or not that's a sentence in this in this language um but it doesn't need that infinite extension existing somewhere it doesn't have to be materialized because it just has an algorithm it just has a simple you know grammar that they can use to figure that out yeah okay okay well this makes a lot of sense so based on reading the lacun paper and this connectionism paper so far um lacoon is saying we need to have um a probabilistic ish uh interpretation of possible futures the connectionism paper is saying that we need to have composable recomposable decomposable abstractions right and and this is very it's almost analogous to what francois chile talks about with his library of modules and and and type 2 traversal et cetera he has this distinct dichotomy between type 1 and type 2. so yeah it's almost as if we're saying look neural networks at the moment there are two clear opportunities for improvement and what are what are those opportunities specifically again so being able to represent possible futures with some uncertainty quantification and secondarily being able to i mean i'm bagging it in with the discrete space but it doesn't necessarily have to be discrete functions but what we've just been speaking about from the violation paper composability rich abstractions maintaining the the structure or the intention and even later on in the computation cycle being able to introspect about how and why i go okay yeah so it's the other important thing that comes up in there in the the pollution paper photo and promotion paper is is one thing that's fundamentally different about symbolic systems is that the code that the the algorithm and the memory are separated okay and this is this is critical to understand here which is that you have an algorithm that can run on memory and if it ever runs out of memory you just need to add on another memory stick okay and it can continue processing with the same algorithm so the algorithm doesn't fundamentally change if you just enlarge enlarge the memory and this is a big difference between connectionist systems right because in a in a neural network if you want to increase the size of the input for example like double the the vector space or something it's all got to be retrained okay because it's all tightly kind of convolved together the the memory and the computation this is a huge difference between these these um you know computational paradigms so i think it's important to understand that it is but i guess if i designed a system it reminds me of object-oriented design in programming and you create these abstractions unfortunately they are hand crafted although maybe it's possible that some of them are so platonic that they would apply in in many systems if you could only recompose them but the that ability to separate the storage from the from the computation seems to depend on that abstraction in how i design the program and lacoone would say well if you have to handcraft the abstractions then learning's gone out the window i i agree with you i agree with both of you this is an extremely important problem we need to solve so once you once you it almost once you go to the route of saying i'm going to take my algorithm and extract it from the memory that's when you run into all these training problems right because now you're trying to train systems like differentiable neural computers and and whatever there are things that have essentially a turing machine consists of two parts really a finite state machine and an expandable memory okay that and then it gets iterated and it can sit there and keep operating on it as soon as you do that separation and you say okay well i'm gonna have that finite state machine be a neural network that i can train differentiably the problem is now that it's abstracted from or separated if you will from its memory and you try to do things like have attention and you know be able to operate on that memory you run into all these kinds of training problems so it's almost like as soon as you if you've got the memory and the computation all you know glued together in a single hole that's finite you can differentiably train it as soon as you separate it out and make the memory expandable now you start running into all these these training problems but we need to do that like we need to figure out how to train algorithms that are abstracted or not abstracted but separated from their memory and allow them to learn these abstractions that they can then use to operate on the memory like somehow we have to figure out how to do that chapter seven we're going to discuss how we rescued the broken recording of jonskin so interviewing professor chomsky was a dream come true for us i just never thought something like this would actually happen and the worst happened the unimaginable happened the recording messed up have a listen to this this was the before clip there's a curious distinction which and people empirically known for many years between sentences like one interpreter each seems to have been assigned to the dip the divots and this was the opticlip there's a curious distinction which has been empirically known for many years between sentences like a one interviewer he seems to have been assigned to the diplomats now it would be remiss of us not to use our combined expertise in computer science and machine learning and whatnot and to throw technology at the problem essentially so the irony wasn't lost on us that chomsky believes that deep learning uh isn't particularly valuable i mean i'm being a bit unfair though he did say that it was particularly valuable for things like speech transcription to you know to help him he's uh he's hearing impaired for goodness sake but we don't take l's here on mlst um taking l's by the way is a british colloquialism it means that um we do not accept uh losing so so dramatically that's why we took it upon ourselves to come up with a solution you know we were not going to let the rather minor matter of a corrupted recording get in the way of us realizing our dreams you know this was the moment of our lives and we were not going to let it slip away now we recorded the podcast with riverside.fm which is the podcasting platform which is supposedly um supposed to prevent recording problems from wrecking your show ironically in our case it did the precise opposite i feel at this point like i could work as the engineering vp at riverside or something like that because i've got so much experience recovering every single possible failure mode on their platform i think i'd be quite a useful spare pair of hands to be around there now riverside of course blamed it on chomsky's hardware but i mean chomsky does so many podcast appearances i mean he probably does more podcast appearances than gary marcus writes blogs trash and connectionism when i listen back to the recording after the show because as we were recording the show we could hear that it didn't sound right and we were all saying to ourselves on on the side chat my god please please god please make the recording sound okay and much to our horror it was completely broken it was so bad i could just feel the entire life force of my being just draining away in that moment i could not believe it if our faces looked really concerned during the interview it was because we were so petrified about what was going to happen when we played back the recording so how did we fix this thing well um it's quite a long story to be honest but even though the recording sounded terrible the interesting thing was that when we ran it through a transcription service the results were still reasonably good so we had the word boundaries and what we wanted to do was synthesize chomsky saying the same thing so we painstakingly went through the script i mean keith must have spent about 17 hours word by word filling in the missing gaps inferring what words that chomsky actually said so we created a transcript we then created a voice clone model so we started off with a tachytron 2 voice clone which allowed us to get past the voice authentication system on overdub and then we recorded an overdub voice using mostly recent clips of chomsky but also some stuff from about five years ago so it's probably a the voice of a slightly younger chomsky that you're listening to we synthesized absolutely everything but then we've got the lip sync problem uh the first thing we tried doing was using lip gan and that didn't work because chomsky has a beard and it didn't recognize his face so we d bearded chomsky we made chomsky narrate the new script and while it was a success i kind of felt it was almost taking the piss a bit too much it looked like a cartoon character and we want to respect chomsky of course as much as possible transcription for example which i'm very happy about because i like to use it i like bulldozers too it's a lot easier than cleaning the snow by hand but it's not a contribution to science so we decided to write a time warping algorithm to align frame by frame the original recording to the synthesized version and to do that we just transcribed the synthesized and the original version we used dynamic time warping so it's very similar to the needleman-bunch algorithm in bioinformatics so you just kind of build an alignment matrix and then you compute every single cell as a function of the neighboring cells and then you trace up behind the matrix from the bottom right to the top left leverstein distance uses the same algorithm so you can kind of keep track of matches insertions and deletions so yeah we created the best cost you know the minimum cost alignment between the two tracks and then between all of the aligned words we just did a linear frame interpolation so when he was saying hello in one script and hello in another script we just did linear frame interpolation and as you can imagine there's all sorts of things that can go wrong when you do that kind of coding because there's numerical precision problems but you know because we were dealing with hundreds of thousands of frames so it was drifting over time maybe in the future we'll make another video about how we did that but needless to say it was in linear time complexity just for all of you recruiting managers at meta i'm sure you wanted to know that yeah so we did a pretty good job but we wanted to stress a few things so first of all we will make the original recording available to peruse because i want everyone to be completely clear that these are chomsky's words you can tell by the lip sync that it is indeed chomsky's words the amazing thing is that with the lip sync you can actually see chomsky's physical expression as he was saying the word so even though technically we've deep faked chomsky um it's amazing how when we synchronized his expression to the generated words it kind of just seemed so real it was like some invisible boundary of reality has been transgressed again and it really was chomsky saying all of that stuff so there are a couple of occasions where we've actually inserted in a little bit of the original chomsky even though it was corrupted just to capture him chuckling or or you know saying some words i mean i think there is a point where he said language models have achieved zero zero they've done nothing and obviously we just wanted to capture the original sentiment of chomsky when he was saying those things in certain parts the generated voice is roughly twice the speed of chomsky and we felt that was fair to be honest because even if we had recorded chomsky i might have sped it up a little bit it's not so much that chomsky talks slowly it's that he has gaps in his speech i've noticed this before when i've played clips of chomsky i've kind of tightened up the gaps in his speech so if anything it'll make it easier to listen to because there's less gaps in his speech so i wanted to touch on ethics because deep fakes is a huge topic at the moment i feel that this is a legitimate use of deep fakes we've basically used engineering technology which is something that chomsky talks about in the podcast to recover a broken interview and we got chomsky's full permission so this is the email that we got back from chomsky by the way this email means so much to me personally i'm going to frame this email i should put this email in my cv just doing all of this work having this story to tell and getting this response from chomsky to me is something that i could tell my grandchildren about it really is that special to me so as i said before you can listen to the original version just to just to kind of satisfy yourself that we didn't put any words in his mouth although chomsky has checked the synthesized version and given us his blessing and i'll also make it clear as well that we will be deleting the voice clone so we're not going to use it again we're not going to give anyone else access to it it was just a temporary expedient and we will now delete that you could have said to us why didn't we just record it again with chomsky and the reason is just our mental states at the time of that interview the interview meant so much to us you can just see it in our face it was like meeting your childhood hero and i don't think it would have been the same again if we if we kind of did it again i felt that chomsky's reactions were very novel i felt that he said a lot of things in this interview with us that he hasn't said anywhere else and he really trusted us and that means a lot to us he entered into a confidence with us at the end of the interview saying that um you know it was it was a really productive conversation and some of the conversations he's been having recently are quite um almost tiring for him you know because he's constantly having to push up hill basically and we're we're his friends you know like we we really um we're really genuinely interested in what he had to say and in a way we were kind of um i think we were really seeing chomsky at his best and and that that is very important and that's why i think that we wouldn't have been able to replicate that if we did it again but anyway um if you do want us to go into more technical detail about how we recover to that recording or just in general really i mean we are we're quite an intellectual podcast uh by the way we've been accused of being armchair philosophers there's lots of gatekeeping apparently uh going on on the ml reddit but yeah it's quite a funny story as well that we had baited breath right so we had spent about a month fixing this recording um it was quite stressful for us because we've got a huge backlog um you know it took quite a while to get the code running we weren't working on the intro just everything was just getting out of control keith and i had a massive argument about it and all of this time i think we were stressed that it would be for nothing because we had asked chomsky for his permission to publish it and then he'd say no anyway and it'd be like all of that time was wasted so um you know it was a really tricky situation for us because we felt cornered we felt like there was no other option this was the only thing that we could have done so um yeah it was it was quite an interesting story to tell at the end of it chapter eight language people like for example wilhelm von humboldt and rousseau they both grasp the idea that languages are basically infinite that they're expressions of human creativity in fact that's a leading cartesian idea at some core level part of human nature which is reflected on the cognitive side in things like language is the capacity to produce and understand and articulate and express new thoughts without limit and without control so the crucial fact about language use is that it's not determined by our situation it's coming out of us as freely willed action in some sense and continually novel and so on and to express thoughts and ideas that are new to oneself and other people but that are intelligible and appropriate and so on this is a core aspect of human nature chomsky is a big believer in autonomy free will creativity and novelty shouldn't be entirely surprising given that he's an anarcho-syndicalist it's really important for him that we are individual actors that have free will that is not determined by the situation we're in which is why he quite often says that language use is appropriate to the situation but not caused by the situation but the really interesting thing that he says about language is that it's an expression of human creativity language is an infinite space of possible expression which is what makes it so remarkable what is language and is there even such a thing as a pure language english is relatively homogeneous you can go a long way in the united states you know i mean i just came from boston and i understand everybody in portland and seattle and so on but that's not true most of the world you can get very different languages pretty close by and much of the world is what we would call multilingual but what does it mean for the language to be pure or when people say they want english to be pure what are they talking about was shakespeare pure i mean first of all there is no such thing as a language there are just lots of different ways of speaking that different people have which are more or less similar to one another some of them may have prestige associated with them for example some of them may be the the speech of a conquering group or a wealthy group or a priestly cast or one thing or another and we may decide okay those are the uh the good ones and some other ones the bad one but if if social and political relations reversed we'd make the opposite conclusions chomsky is often referred to as the father of modern linguistics so for us having this rare opportunity to discuss minds and machines linguistics and cognition with professor noam chomsky is literally like having the chance to discuss syllogisms with socrates or having the chance to discuss the mind-body dualism with renee descartes it really is that fascinating and it really did happen professor chomsky was so humble as to give us some of his precious time to discuss many contemporary issues especially as they relate to many hot topics in artificial intelligence chomsky's goal as a linguist is to find principles that are common to all languages which allow people to creatively speak freely and understand each other chomsky's work is so much more than linguistics he actually thinks that linguistics should be a branch of psychology and that so much about our language actually determines how we behave as human beings i think finding the principles common to all languages and understanding what enables us to speak freely and importantly creatively is noam chomsky's number one goal in life as a linguist chomsky said that when we study human language we're approaching what some might call the essence of humanity the human essence the distinctive qualities of mind that are so far as we know unique to humankind hey folks i really hope you enjoy the show with noam chomsky today i mean as you can tell from the introduction this has been an emotional rollercoaster for us just over the last couple of months or so we've done so much stuff to recover the recording to build the intro it's a slightly new domain for us as well getting into cognitive psychology but anyway please hit the like and subscribe button drop us a comment let us know what you think we've got six incredible shows coming up we're building introductions for them as we speak uh you know the likes of joshua bach and uh david haar and and many others i don't want to spoil the surprise so yeah hit the subscribe button and i really hope you enjoy the show today cheers chapter 9 the chapter that you've been waiting for this is our discussion with chomsky enjoy professor chomsky is an american linguist professor cognitive scientist social critic and political activist sometimes called the father of modern linguistics and is the most cited living academic he's a laureate professor of linguistics at the university of arizona and an institute professor emeritus at mit some of the big names which professor chomsky has influenced include stephen pinker jerry foda george lakov and barbara party professor chomsky it's an absolute honor to welcome you to mlst this is a dream come true for us i've still got 10 of your books on my bookshelf and i can't even believe we have this honor of speaking with you today very pleased to be with you large language models such as gpt-3 are receiving huge investment and are being hyped beyond belief this is happening despite very strong theoretical arguments for the futility of learning language from data alone the combinatorial complexity of language is on a scale which would eclipse any earthly data set there's also this problem of the so-called missing text that is to say human cognition extrapolates from common knowledge in order to understand text we can ascertain background knowledge which is never actually communicated in the text we believe that research into large language models is what francois chorley recently called make believe ai and is thus the road to nowhere gary marcus even calls it apollo trick assuming that you do believe that large language models are not the solution for natural language understanding which paradigm do you think is the most promising well first we should ask the question whether large language models have achieved anything anything in this domain answer no they've achieved zero so to talk about the failures that's beside the point let me give you an analogy suppose that i submitted an article to a physics journal saying i've got a fantastic new theory and accommodates all the laws of nature the ones that are known the ones that have yet to have been discovered and it's such an elegant theory that i can say it in two words anything goes okay that includes all the laws of nature the ones we know the ones we do not know yet everything what's the problem the problem is they're not going to accept the paper because when you have a theory there are two kinds of questions you have to ask why are things this way why are things not that way if you don't get the second question you've done nothing tpt3 has done nothing with a supercomputer it can look at 45 terabytes of data and find some superficial regularities which then it can imitate and it can do the same with all languages if i make up a language which violates every principle of language with 45 terabytes of data the same super computer and will do the same thing in fact it's exactly like a physics paper that says anything goes so there's no point in looking at its deficiencies because it does nothing oh it doesn't waste a lot of energy in california i should be more careful it has some engineering and applications that can be used to improve live transcription for example which i'm very happy about because i like to use it i like bulldozers too it's a lot easier than cleaning the snow by hand but it's not a contribution to science so it's okay i mean if you want to use up all the energy in california to improve live transcription well okay gpt-4 is coming along which is supposed to have a trillion parameters it will be exactly the same it'll use even more energy and achieve exactly nothing for the same reasons so there's nothing to discuss it's exciting for the reporters in the new york times you probably saw the lead article in the times magazine a couple of weeks ago they're absolutely ecstatic we now have machines just like a human you can fool reporters but you shouldn't be able to full computer scientists yeah first of all i can't say how much of an honor this is professor i mean um it goes without saying that uh we think uh you're one of the people that know something about language unlike what we hear these days uh so as you can imagine there are many questions that i can ask but i'm gonna ask a question that is uh is about the current dominant paradigm uh i'd like to know your thoughts on the current rise of connectionism or connectionism or the resurgence of connectionism let's put it that way and the ostensible success of deep learning and specifically i'd like to know do you think the classic photo and piloting critique that was written in the classic paper connectionism in cognitive architecture a critical analysis do you think the critique there has been answered or do you think the success of deep learning has been illusory well i think there's a good answer to this question and an interesting research paper by a very good cognitive scientist at northeastern university iris baron she did a study which essentially shows that in brief and persons of his innate she found that with studying children adults and so on they're automatically driven radical empiricist conclusions it's just something that comes naturally to us okay that's connectionism no matter how much it's refuted it's all going to come back because it's an instinct our instinct is to try to find something like that it's kind of like what happened in the 17th century the problem in the seventh century was the nature of moshi that was called the hard problem in those days how can you account for the fact that without contact you can make things move and there was already a mechanical science developed by the court and i believe galileo newton and everyone actually newton in his principia showed that it doesn't work there were no machines nothing works like the machine does it's an invention but it's not real it was very hard to deal with newton himself regarded the conclusion as a total absurdity and spent the rest of his life trying to refute it latents christian hygienes the great scientists of the day they just assist this is ridiculous and there's a reason the mechanical science is intuitive that's what we think about things study infants uh put two bars near each other but not touching and if they move together the infant will assume there's a connection that's the way we're built it took a long time for physics to realize the world doesn't work that way the way we intuitively think about things is just not the way the world is maybe someday cognitive science will reach the level of physics and 19th century and recognize that our intuitive concept of the world isn't the way it works so the photo position critiques to the point was accurate but it barely touches the surface the whole approach is radically wrong everything we know about learning totally refutes it and what we call learning is mostly the kind of growth it's the growth of natural instincts in one or another way under the triggering slightly shaping of experience the entire framework of these things is wrong and you can see it very clearly in the case of use of language it's very easy to show by now it's even been experimentally demonstrated that from infancy as early as you can contests less than two years old children are ignoring 100 of evidence that they're presented within relying totally on mental constructions that they never perceive i mean i can give you examples but it's we all do it all the time in our use of language we simply ignore all the data and use mental constructions and infants do it as soon as they test it it's not learning this is just instinctive behavior that's the way our visual system develops it's the way you come to walk the way your immune system develops and language develops and other things other aspects of knowledge in fact there are many things we know about language that we don't even know that we know we can't introspect into them you have to have to do experimental work to show that people know them but just i mean that's just like other aspects of the organism i don't expect to be able to introspect into the functioning of your entire nervous system the so-called second brain the huge nervous system very much like this one which is just down here and runs most of your body not an enormous nervous system that has billions of neurons which you can introspect into but why should you able to introspect into what's going on in the year it can't you have to study it from there on the outside the way you study everything else well that's hard philosophers won't accept it at all they totally reject it just like they rejected newton as obviously absurd but the cognitive scientists occasionally look at it but don't really think about it the rest just aren't interested i mean we're back in the 16th century in these fields and we have to break out of that that's not easy it took a long time for physics to break out of it in fact pat newton's theories couldn't be taught at his own university cambridge for about 50 years after his death because they were so obviously absurd well if it's hard for physics it's going to be harder for cognitive scientists as a quick follow-up uh connectionism symbolic [Music] that that debate has went on for a long time do you think there's any any credibility to what people call hybrid or uh neurosymbolic and it goes under different labels do you think there's anything to that approach at least okay learning everything from data probably is is in in some people's mind not practical but is there anything that neurosymbolic approaches or hybrid approaches can can uh can deliver to the whole debate yeah there's a lot of extremely intelligent exciting work it's not it's not trivial work you know there was a lot of thought and understanding mathematical sophistication and so on in this work it just doesn't happen to be contributing to science it's contributing to other things like deep learning approaches have been very useful in protein folding for example they've really advanced understanding there it's a good engineering technique that is i mean i'm not a critical of engineering i spent most of my life at at the world's leading engineering institute mit it's terrific you know i mean it's useful for things like the google translate uh live transcription speech recognition there are engineering projects that are significantly advanced by these methods and that's all to the good i think that engineering is not a trivial field it takes intelligence invention creativity these great achievements does it contribute to science i actually i think there was an interesting transition at mit where i was most all my life in the 1950s when i got there that was the time when i was beginning marvin minsky herb simon other people alan turing who were in their view it was supposed to be a study of the nature of intelligence it was a scientific field by now that's disappeared not anybody's interested but at mit at that time was an engineering school there were great people in math and physics but they were basically teachers and engineers though it changed in about 10 years by the mid-1960s 10 years later mit was a science university engineering was unified every student no matter what engineering discipline they want to go into took the same fundamental courses in science and math you take basic physics basic chemistry biology math and then later on you apply it in aerobical engineering or mechanical engineering whatever you're interested in that was a huge transition totally changed the nature of the institution they brought humanities in for the first time because the science students were interested in humanity and what really happened was that that for the first time the basic sciences had something to teach to engineers that hadn't happened before if in the 1950s you wanted to build a bridge or construct a lecture theater or something you just did it by skills that had been developed in the engineering profession well in the 1960s that was no longer the case physics and math really had something to tell you so you had to know something in order to move ahead well that took a long time you know that's physics after it was a developed field were a long way away from that in the cognitive sciences but unfortunately that kind of work that people like herb simon and marvin manxy were interested in has pretty much disappeared from ai it's become basically an engineering field though it has plenty of achievements as i say engineering is a very noble profession uh just doesn't contribute to science professor chomsky other than the issues with large language models we discussed there is also the let's say minor matter that we're not silicon our biological wetware implements a kind of hybrid analog and digital computation which might might realize aspects that are effectively impossible to replicate in digital circuits alone sir roger penrose goes as far as to hypothesize that our brains take advantage of quantum properties to access non-computable oracles making our brains what turing would have called oracle machines we'd like to ask where you stand on these points perhaps you are a computationalist who believes human cognition can be digitally replicated in silico or maybe you are open to the possibility that human brains are hyper computers of some kind so what do you think well first of all i'm completely incompetent to have any opinion about roger penrose's theories about quantum properties you know i have no idea he's a smart guy obviously he's a you have to pay attention but i frankly don't think it matters not at this stage of understanding at this stage of understanding i don't see any reason to question the fact that we are organic creatures like the rest of nature that whatever's going on here is some property of organic matter whatever matter is and then if it could be duplicated in a silicon system it essentially wouldn't tell us anything it would tell us there's some general properties of this organic system which also exist in some other system maybe but since we don't know what's going on up here i don't see a lot of point in speculating about it but the very basic questions to deal with about the nature of what we know about authentic language our ability to deal with what you and i are doing here there are fundamental questions about that that unless we have some grasp of those i really don't see a lot of point in speculating about quantum theoretic properties or possible silicon duplicates of what we don't understand here so yeah they're possible questions they just don't seem to be at least to me on the research agenda now penrose of course seems to be thinking about a serious problem a problem about memory memory models that are studied are mostly in the neural net models and in fact deep learning is based on those there is a serious question about whether neural net models are even in the right place to look it's uh i think here randy galisto's work is very significant arguing that if you look at neural net models they simply don't have to have the capacity to have the basic elements of a turing machine the core of computing is going to be some form of turing machine that he's argued i think pretty persuasively you simply can't find those elements in neural net models no matter how you proliferate them uh penrose has recently picked up the same idea he's argued that as gala still has that computing is not being is not taking place in neural net models that there's a lot more reasons to think so rather it's at a much deeper level you know maybe even an rna if you look deep inside the cell there's huge computing capacity it goes way beyond what you can achieve in neural nets it also isn't in trouble there's a big problem with neural nets that goes back to helmholtz they're damn slow neural transmission is slow of course not by our standards but by the standards of anything you need for computing by the uh and if you go back if you go down to i mean they've known there already is work showing then that perkins sales have huge computing capacity internally just their big very big cells internally without any external connections and maybe that's the source of uh where compute is really going on in the brain i think that's the kind of thing that penrose is doing he argues that at that level you do have quantum effects well maybe so um i can't make any judgment about that but there's a large problem not being that that's biting this this work the question of whether the whole framework of neural net models is even appropriate uh the way the way galistol puts it are we like the drunk living under the wrong lamppost you know cause that's where the light is you believe that there are limits of human understanding uh mysteries of nature which human intelligence may never grasp and cannot formalize professor kenneth stanley goes further and claims that the veneer of formalism in particular the formalism of metrics and objectives may paradoxically impede scientific progress by blinding us to creativity and serendipity in exploration and learning he believes that open-ended exploration something which he calls treasure hunting is necessary to find valuable stepping stones which might lead to greatness that is to say stepping stones which formal objectives would have blocked us from discovering i just wondered what do you think of this view well first are there questions that can be formulated that are outside of our cognitive range i think it would be a miracle if it's not true unless we're angels that's going to be true if we are organic creatures part of the organic world then there'll be scope and limits to our capacities in fact the scope and limits are related so i have the capacity to walk much faster than a chimpanzee much better than an eagle but by the same token i can't fly or jump around trees the same intrinsic characteristics that provide me with capacities impose limits but well that's almost a bit of logic not exactly but it's pretty close so if we are organic creatures we're going to be like other organic creatures and that there are bounds to our cognitive capacities so for example a rack can be trained to run pretty complicated nases but it can't be trained to learn a prime number maze turn right at every prime number it just doesn't have the concept and no matter how much training you do you're not going to get anywhere well i suspect there's reasons to suppose we're like rats we have capacities we have a nature we have a structure that yield all sorts of extensive range of things that we can do but they probably impose limits and i think we could even make some guess about what these limits are actually one of them was suggested pretty strongly in the 17th century i mean we're not any smarter than newton galileo leibniz nothing relevant has been learned to help alleviate her concerns and i think we have the same concerns to me it's as much of an absurdity as it was to newton though i can move the moon by raising my hand total absurdity okay they didn't regarded it as an absurdity lavender did galileo did they wanted an explanation in terms of an intelligible universe that was the goal of early modern science let's find an intelligible universe and an intelligible universe meant mechanical science something that skilled artisans could construct like incredible clocks other objects that skilled artisans were constructing in europe at the time which almost acted like humans so that's the way the world is directed by a super skilled artisan if you were a dazed then went home it was a retired engineer who set it up and then left it run by itself was the big issue at the time but the point is the universe ought to be intangible making sure that the universe is not intelligible and what happened after newton took a long time science just reduced its aspirations it doesn't seek to find an intelligible universe it just seeks to find intelligible theories about the universe so leidnitz could understand newton's theories they were not intelligible it was the world that we were describing that was unintelligible well that's a big shift in the nature of science it wasn't particularly recognized but it just became tacit you don't even even look for intelligibility anymore you want theory that meets the conditions of intelligibility for a theory for example we get to what we started with the theory is no good at all unless it tells you why things are not this way okay otherwise it's not a theory in the least like gpt3 deep learning and their approaches they don't under the domain of theories you can even look at it so there's nothing to say about them but like my anything goes theory but um that's a condition that theories have to me but what the world is mean we have nothing to say about it whatever crazy things physicists come up with okay if that's the way it is that's the way it is you know it's not intelligible to bad for my cognitive capacities well what are the mysteries in this universe that are beyond our scope i think we can make some guesses there are questions that have been asked for thousands of years where we have made zero progress not even bad ideas about them one of them is the 17th century hard problem motion we've given up on that motion is whatever physicists tell us if it's gravitons in a quantum system okay then that's what it is beyond our capacity to i don't conceive of except we understand the theory uh so i think that's one candidate another candidate is what you and i are now doing that's been a problem for thousands of years have who would be constructing in our minds infinitely many thoughts picking out of them how do we do that and then communicating it in a way which allows others who have no access to our thoughts to grasp what's internal to our minds how on earth do we do this galileo regarded this as one of the great miracles of the universe and is totally beyond our understanding galileo regarded the alphabet as the most spectacular of human inventions because it somehow captured this miracle with a finite number of symbols you can not only construct an infinite number of thoughts which is miraculous enough but you can also pick one of them out and use it to convey to the others the internal workings of your mind we have absolutely no idea not even bad ideas about how any of this can go on in fact we don't even have any idea of how i can decide to lift my little finger none it's just a total mystery uh of course you can make claims about it but you can't do anything about it well maybe it's just beyond their cognitive capacities and i think there are examples like that where we just hit a blank wall we can't do anything whether there's further things to say about it i mean even ordinary normal creativity the kind that goes on like in speaking normal speaking is a highly creative act a scientific invention is a greater creative act great art is an even greater act but but all those things are totally beyond our comprehension from lifting my little finger to writing a beethoven quartet we haven't a clue or talking we just have nothing else to say about it fascinating it's a cognitive horizon this is regarding the the many theories of semantics that have cropped up over the years for example truth conditional semantics logical semantics ontological semantics etc which if any paradigms of semantics do you think are headed in the right direction as far as getting us closer to an actual science of semantics or will we ever have a formal science of semantics as montague thought well i think there's very rich exciting work in what's called semantics it's been one of the most lively fields of theory in linguistics philosophy cognitive science and years you mentioned barbara purdy earlier one of the pioneers in this field great work it's not semantics it's syntactics it's all study of symbolic manipulations that go on in the mind suppose you do model theoretic semantics the kind barber party does how you do model theoretic semantics what you do is identify certain individuals and certain predicates and you ask how the predicates are distributed over the individuals under various conditions what are the individuals mental objects not things in the world they are mental objects or something do they correspond to anything in the world very loosely if you actually look carefully at the mix of words uh there's a very loose connection to anything in the outside world take aristotle's example he discusses this uh his example is house so what's a house well in his metaphysics house is a combination of form and matter the matter of a house is the bricks the timber things that a physicist could find the form of the house is the intention of the designer the characteristic use things that are in the mind in fact that's what a house is the thing could look exactly like a house for a physicist and not be a house it could be a library it could be a stable could be a paperweight for a giant you know it could be anything because the meaning of every word is largely a matter of our conceptual structures and that's true the simplest words that you find actually the first example that was used in physical philosophy was river pericle is pre-socratic as having you cross the same river twice it's a pretty deep question if you think about it the second time it crosses its totally different physical object it wasn't the same river when you start looking at that the form it's what we construct in our minds as what constitutes river but i happen to live in arizona now on my way to the university i crossed something called the relito river i have yet to see a drop of water uh old-timers tell me if you could go with them on soon there's sort of water flowing though it's a relato river um if it got paved over and started to be used for commuting it would be the reliedo highway it's the same object uh and that's true for every word in the language there is simply no semantics and natural language at least semantics of the dense of fridge tarsky corner coin any formal semantics it just doesn't exist in language we have mental operations going on that have some loose relation to the outside world but it's not truth and it's not reference there's just no occurred so what's the best approach to this my own view the most productive approaches are they what are called the event semantics near the davidsonian developed by paul pietrowski barry shane a number of others which essentially started with a question like uh why can't if we say john read the book quickly why can we infer that john read the book okay that was the original question and the proposed answer is there's an event reading uh there's an agent john there's an agent book and there's an adverb the modifier of the event quickly if you analyze it that way that's just the concatenation and you get the inferences that's been developed extensively by people like piotrovsky and shane among others that happens to fit very naturally too what is i think is a coming we're coming to understand is pierce index it seems that that's the way pure syntax provides structures of that nature which fits very naturally into events and semantics but notice the event somatics is syntax when you talk about an event it's not anything in the world it's something that we constructed our minds uh there was a gentleman named zeno who taught us something about that how many events are there when i cross a room as many as you decide to put there you know there's no end up to the power of the continuum the so uh event somatics i think is productive as a form of syntax then comes another question how do all these things going on in our mind relate to the outside world that's one of those questions that don't think we have any answer to now we're back to galileo's problem how we do these things we don't know we do them we do a lot of things but uh we have no understanding we'll probably never have an understanding of it i i'd like to ask you professor about uh what what do you think is the relationship between what you have called uh universal grammar or the i language and folder's language of thought which uh has been also quite uh a theory in linguistics and cognitive science and if we can suppose that both as human innate systems endowed by genetics and or the laws of nature if that is a similarity or is it and if not uh or and if it is how do they differ like a bit so basically how are they similar and how are they different uh that gets to the heart of current advanced inquiry in my opinion stewart uh jerry fodor was a close personal friend we talked about these things all the time but ask yourself what's the language of thought far as i know it's english uh do you know anything about the language of thought that isn't english not in jerry fodor's work just english well of course it's not english but it should be what is common to human languages whatever is common to human languages that should be the language of thought though what's common to human languages universal grammar it's just its definition that's the name of what is the core of all languages whatever it turns out to be asking what is universal grammar is asking what are the laws of nature well try to find them you know but make the best cases you know find out they're wrong find better ones and so on that's universal but there is debate in the cognitive science of the virtue about whether universal grammar exists it's another illustration of the pre-scientific character of cognitive science the question is meaningless there's something that distinguishes a human infant from a chimpanzee with regard to language okay if you don't agree with that you're a flat earther but if you agree with it the next question is what is it answer universal grammar whatever the turns out to be the question about its existence doesn't even arise i mean it's like arguing with somebody who says everything's done by angels useless discussion you know so the question is what is it that's distinctive about human language that it's important um there is a long tradition going back to aristotle in fact rife for centuries into the 20th century assumed that what a language is is a system for generating thought that language was sometimes defined as audible thought you know we know audible is too narrow now we know it can be signed in other sensory motor systems it's irrelevant it's like a computer program that can be hooked up to any printer it doesn't care but that's the internal thing like the computer program or the i language that's a system of thought and the language of thought will be whatever that system happens to on a compute probably identical among humans far as we know there's no distinction among humans at least the capacity to acquire a language any infant as far as we know can acquire any language with equal facility so it's probably uniform which would not be very surprising but humans are a very recent species couple hundred thousand years that's a flick of an eye in evolutionary time and we know from genomic evidence that humans began to separate on the order of 150 000 years ago that means there's a very narrow and they all share the language faculty equally and there's no new evidence that existed at all before modern homo sapiens so there's a very narrow window in which it seems to have emerged probably hasn't changed since so we have certain expectations it should be something very simple something that you just followed from natural law if you look at the way evolution works uh not stories actual evolution basically has three sages the first stages you have system function some random disruption takes place really random annotation a gene transfer some bacterium by accident swallows another microorganism gives you variety eukaryotic cells complex life you know just random events take place and they change the structure of the system and mother nature comes along at this stage and finds the simplest solution to whatever development what einstein once called the magical cree latency's law of least effort it always seems to work in every branch of science whenever you understand that anything turns out it was the simplest solution so that's the way mother nature works can't really give an explanation but it's so overwhelmingly supported that nobody even questions it and if you don't have the simplest solution you figure you're wrong you know that's the ordinary scientific way and as i say einstein just called it the magical creed that's the way it is that's how nature is so we'd expect that when mother nature some event took place random event which provided homo sapiens with the capacity for recursive enumeration the fundamental property of the computational system no other organism has it it's nowhere you know it's uniform in humans mother nature came along and said okay let's find the simplest way of handling recursive illumination with some special conditions namely it has to produce thought so it has to have some kind of at least primitive way of having conceptual entities which enter into the thought light probably event semantics like maybe we can save the world in terms of events agents uh patience uh modifications so on put that together with recursive enumeration find the simplest possible solution that ought to be a universal group to the tests of researchers in linguistics and cognitive science ought to be see if you can ensure that the simplest possible solution to this conundrum uh yield explanation for the phenomenon of language that's the task of the field almost nobody's interested in it you can count the number of people on the fingers of one hand but that's what the field ought to be and i think they're progressing in that i think we're maybe entering a new era where for the first time first time ever we can give genuine explanations for fundamental properties of language in fact one of them i mentioned the most striking dramatic feature of languages what's called structure dependence the fact that from infancy every human understands unconsciously that all the rules of language while operations and language have to ignore their order of words and deal just with structures so ignore everything you've heard deal with the abstract structures in your mind you can demonstrate this directly overwhelmingly that's the way it works we now have an explanation for it turns out that that's what follows from the simplest combinatorial operation the simplest combinatorial operation happens to be binary set formation what's called merge and contemporary literature well if language is based on binary set formation you get this property no linear religious structures so we had a for the first time ever a deep explanation for the most fundamental property of language which is a very surprising property which tells you something about learning cognition and so on almost nobody's interested in it they take a look at the literature and cognitive science there's an endless number of papers trying to show that by massive statistical analysis of huge amounts of data you can begin to approximate but you can explain nothing i mean of course they all failed it's not interesting why try it in the first they have a perfect explanation the best possible for some fundamental mental property what's the point of trying to see if a couple of supercomputers and massive amounts dated they can approximate it i mean it's madness you know but that's the field that we're in it's madness that's it and you can't it's very hard to get this across i mean it's not of interest to people the idea of finding an explanation for something it's just not of interest i mean it was for touring was for more of minsky who i know pretty well herb simon the other pioneers of mccarthy pioneers of ai you know that was interesting to them they tried well it wasn't me at that time and it was given up and i'm busy know the field by now that's considered old-fashioned nonsense we don't care about that stuff anymore it's uh in other words we don't care of anything of any interest we just want little things that make some money okay that's okay that's what you're that's probably where the field will develop that's where the money is you know and the jobs but it's a sham i think some people try to hang on to the old ideas since they want to do something of intellectual interest which profound misunderstandings of language and linguistics persist even at the highest levels of the scientific community do you mind that many of your own scientific ideas are widely misunderstood totally misunderstood it's amazing to me so there's one paper of mine that i presume you know about it was in 1956 at three models for the description of language that did into the literature i've been very interested in the fact that for 60 years the paper has been totally misunderstood and no matter how many times i try to explain it i can't make any progress uh you've probably you probably haven't looked at that paper but you get the material drawn from it in elementary introductions to cognitive science what's called the chomsky hierarchy which is all stuff i said you shouldn't be looking at that's what the paper said the paper had three models one of them was markovian sources everybody was using at the time and argued it can't work the second model was rewriting systems poster writing systems which if you put some conditions on you get context sensitive context free grammars finite automata so the hierarchy there was a third novel that's why it's called three models the third model was the only one i thought made any sense because it began to provide some explanations for things the other ones were just descriptive models wrong descriptive models but the only thing that's come out of that paper is the role models there's huge literature on context free grammar context sensitive grammar i've written about it it raises some interesting questions about automata theory that you know context-free grammars happen to be broadly equivalent to push-down storage automata kind of a useful result but basically they tell you nothing about language these models models just deploy the language they're the only ones that have been studied and in all literature in the last 60 years nobody's noticed there's a third model though the paper is called three models and the point of the paper is to show that the other models just don't work well that's only 60 years maybe somebody will notice but the but the more interesting question was the first part what are the misunderstandings and i think we can a useful way to look at is to think of what have been major problems they've actually been given names so let's use the name uh the plato problem how can we know so much with so little evidence well that problem is badly misunderstood and there's uh we started off by tovi about one of the misunderstandings the idea that if you have enough data you know trillions and trillions amount of it and you have battery of supercomputers working you're going to deal with this problem no you're not you're going to get nowhere you can show in advance that you're gonna get nowhere okay that's one misinterpretation which holds not just for computer science it holds for philosophy linguistics total misunderstanding i think probably for our experiential reasons were instinctively radical and persist and it's hard to get out of that it took the 17th and 18th century for physics to break out of it another problem is what's sometimes called darwin's problem how do we get this language system uh it's inning it's common to humans no variation as far as we know no trace of it in any other organism there's a lot of time wasted trying to train poor chimpanzees to duplicate some of these things which makes about as much sense as trying to train graduate students to do the waggle waggles of the bees when you could train them to somehow mimic it maybe but it would be idiotic it's equally idiotic to try to train a chimpanzee to do what we're doing well for the first one the graduate students you can't get an nsf gram for that because it's so obviously idiotic the second one which is equally idiotic you can get grants for a lot of people working on it and so on that's part of the overwhelming irrationality of the way the human mind is studied that's darwin's problem now the last problem is what's now called descartes problem how can we do what we're doing right now how can we speak in ways which are appropriate to situations but not caused by them that's a huge problem which is basically the question of freedom of choice how can we do it years a this is we're involved in it all our lives but it's very interesting if you look at the philosophical literature the scientific literature is sort of interesting but virtually percent of people who think about saying we're all determined we're all thermostats everything we do is totally determined you look at the behavior of the same people a hundred percent of the time they act as if they're not determined so and it's this way constantly then even the people who give arguments saying determined or testedly assuming we're not determined otherwise do i give the argument um you know if it's determined if it's a thermostat why bother so that goes all the way up to einstein and gave arguments trying to show that everything's determined proving by his effort to do it that he was not determined okay so here we have a kind of a paradox everybody says we're automated everybody acts all the time as if we're not automatic will the science say anything about this nothing science says we can't deal with it it says we can deal with it with determinism we can deal with randomness but we can't deal with things that are not determined or random so here's the system situation some extraterrestrial intelligence is looking at us these strange beasts down here a hundred percent of the time it says is determined a hundred percent of the time it acts as if it's not determined it believes science which tells it nothing all right is it a misunderstanding sure seems to me like a deep one so if you go to plato's problem darwin's problem descartes problem you see very profound misunderstandings which dominate the fields all the fields philosophy which is supposed to be sophisticated is probably the worst of all maybe i can add one point about confusion if you look at the philosophical literature today there's something that everyone's obsessed with it's called the hard problem the problem is consciousness what's it like to see the sun rise let's go back to the 17th century which is an interesting century it's the birth of modern science they had a hard prob problem as it was called the emotion the hard problem was motion uh how was a hard problem dealt with in the 17th century well properties of motion were formulated they said here are some properties of emotion you know galileo's experiments and so on actually thought experiments he never carried them out if he tried to carry them out it never would have worked they were thought experiments you know you drop a dropper ball from the top of the mass of a moving sailboat and it falls to the base not to the back thought experiment if he tried he would have got craziness but uh that there were properties of motion that were established and then came the hard problem how can we explain the properties of motion well the answer was we can't so we give it up we move to something else trying to do theories of the properties of motion without understanding the properties that's the 17th century now let's go to the 20th and 21st century there's something thing called the hard problem what's it like to see the sun rise there's a step missing the step that was taken in the 17th century what are the properties of being of it's being like to see the sunrise i can't tell you i could write a story about it write a poem about it but i can't say here are the properties of what's like to see the sun rise so what's being posed is an unanswerable question a question that can't be answered so if you pose an unanswerable question you're not going to get an answer the question can be answered only if you can formulate it that you can say here's what i'm trying to explain but you got this huge literature total obsession in philosophy of mind how can we answer an unanswerable question i mean one philosopher a young philosopher galen strawson commented the 20th century must be the silliest century in the history of philosophy he's exaggerating but not by too much in a we're in a strange period in all of these fields we're just consumed with elementary misunderstandings it's very hard to break out of them it's a very irrational period it's reminiscent of the period when people argue about the right interpretation of the eucharist and so on you know i mean it's a strange period a lot of sophistication but massive misunderstanding i'm curious um on the on the side of say things that we might be able to answer what do you think some of the greatest remaining mysteries of language science or philosophy which we have yet to solve and yet may be able to answer and which are some of the areas of research that you find most personally exciting exciting to those ends well i think there are questions right at the border of research which i think are very exciting like the one i mentioned i mentioned that i think for the first time ever for thousands of years we can now give deep explanations of some fundamental properties of language and foot which are probably the same thing let's assume that language and thought are two different ways of looking at the same thing thoughts what is generated by language and language is what generates thought and one of the deepest mysteries is this property of structure dependence that i mentioned in fact it's the deepest property and a very surprising one and we now have a perfect answer to it it's accounted for by the simplest but the assumption that nature acts the way nature always acts by finding the simplest solution then the simplest solution for a combinatorial system happens to be binary set formation well you look a little further there's much more to it but that's the beginning of it and now i think we can press forward with that as what what should be a component of language assuming that nature is perfect but you look you find there are some things which are there only because they follow from these properties so uh there's a subpart language called control theory uh you have a curious uh now i have to get into details but there's a curious distinction which has been empirically known for many years between sentences like a one interviewery seems to have been assigned to the diplomats one interpreter each tried to be assigned to the diplomats first one's okay second one isn't uh why well the answer is a whole system called control theory and we now have an explanation for it in terms of the assumption that nature picked the simplest possible answer that kind of question that can now be raised for the first time and sometimes answered well i think that's exciting moment in history of the study of mine that i see it interact with what to me at least seems the most promising mode of formal semantics namely event semantics that links very closely to that so if you can work out the ton of remaining detail it's not elementary you can imagine how you'd work them out you would have a basic answer to the structure and understanding of our system of thought and expression that would be a pretty exciting development i think there's many others so for example you know this goes way beyond my competence so i can just talk about the problems not what they mean but somebody a friend a quantum physicist recently sent me a paper in a quantum theory journal which was a symposium of half a dozen leading figures in quantum physics they were discussing what is a particle they have no idea they don't know what a particle is it's a big thing you have to talk about in physics they have a lot of vague ideas about it but they don't know the answer well that sounds like an interesting question to me what is a particle and here we go back to the question of consciousness and look at the strange way the topic isn't studied you go back to newton again let's say newton recognized that we know so little about matter that we cannot say that all matter has life maybe a stone we know so little about matter so little about life we can't say with our own matter as life well back last century sir arthur eddington great astrophysicist he said we know little about matter we can't say whether all matter is conscious not because we don't know about consciousness we know massive amount of consciousness more than anything else but we know nothing about matter so we can't say whether all matters conscious well that's question two i don't think a stone is conscious but can i surely know enough about matter to explain that a stone isn't conscious that's a lack of knowledge about matter so we not me but advanced physics will pursue the question can we find out enough about matter to answer questions like this or even tell us what a particle is well those are questions at the border of research and in every area you're in you have the questions what's life you know versus non-life uh any area of science here in you overwhelmed by questions like these some of them are at the border of research so you can hope to go forward others are so beyond far beyond that you can't even speculate sensibly about them some of them may turn out to be true mysteries for humans that it's just beyond the cognitive capacities you have to study them in independent ways you know kind of sideways well among the problems of language era what the neural bases what's the neural basis for language i mean something's going on in our brains when we're doing this what's going on it's very hard to find out partly for ethical reasons you just can't do the experiments that might give you some answers because they were unethical you can't stick an electorate into a particular neuron and broke his area to find out what's going on we do it with cats and monkeys we don't do with humans oh well you could argue about cats and monkeys but the fact is we do it that's the way we've learned about the visual system but you can't with the language system there's no organism that has anything like it so you can't do it with other organisms we don't allow it to do it we don't allow ourselves to do it with humans we don't raise humans in controlled environments to see what will happen or in principle if if the nazis had won the war maybe we'd be doing it but we don't do it so it's just very hard to find answers to questions that even when we know how to find the answer that means you have to be much more ingenious do much more sophisticated experiments but these are things at the borderline of inquiry actually yesterday i sat in the dissertation defense of somebody who's actually working on this there were interesting things uh white white matter connections between brookes and warnick's area two areas that seem to be implicated in language and there's some evidence that these way matter connections become super myelinated in early development in humans but not apes connecting the two uh um areas of the brain that seem to be implicated in language and interestingly they also have a connection to the auditory system also lacking in apes so maybe there's something in white matter that has something special to do with language that's the kind of topic you can investigate from the outside in any area of science you know about you got many questions like this so there's no shortage of things to study it's just a shame that huge amount of effort money scarce energy or wasted in doing things that make absolutely no sense well and professor chomsky we're gonna wrap the conversation here but can i just thank you so much for coming on our podcast it means so much to us sincerely thank you very much pleasure to talk to you what was the best thing about talking with chomsky best thing about talking with chowski oh boy um i think i'll tell you what and this is kind of personal to me i think because you probably know and from the show and whatnot you know i i like plain language i mean sometimes i'm forced to use technical terms but i like plain language and i like to think about concrete things and i like to make jokes about you know bulldozers or you know very down-to-earth kinds of topics right and so when i realized how down-to-earth he is and and and somebody that you could just hang out with and joke joke about and talk about things concretely in the real world and yet he's you know the foremost intellectual of our era right i think um it was nice it was nice to see that like you can be both at the same time you know you can be you can be an ordinary human being and an extraordinary intellectual all rolled into one yeah he was taking the piss out of neural networks he said he liked bulldozers too yeah they weren't a contribution to science yeah exactly well exactly so you know he's saying they're a great feat of engineering they just don't they don't contribute to science and i just i just i found it really enjoyable to talk to him was it was very um it's very fun he's very funny and and uh yeah just all around it was it was great great experience yeah and officially this has been in terms of production and the amount of hours of faffing around this has been by far by far the longest cycle we've had on any episode yeah and of course i've already made some content about how we fixed it but yeah it's just i mean we've fallen out over it we've made up again now but yeah it's been an incredible slog just getting it all fixed and thank god chomsky let us publish it yeah you know as we've explained here or elsewhere in the show i mean it was obviously a lot of it was accidental okay we didn't you know technological failure that resulted in this challenge but i have to say for me personally uh it it made the journey all the more epic and satisfying that we um that we really pulled off this um you know chomsky called himself a miracle of engineering you know to to restore the show it doesn't contribute to science but it was a uh was a miracle of engineering and i mean you know it's it's it's one of these life stories that i think um in the end i'm going to remember forever and it couldn't have happened i mean in a way the the uh the person that we were interviewing this was the perfect person for it to happen to other than than the loss of quality quality because of just all the ironies and sort of the the process of doing it and and you know we we really wouldn't have put this tremendous amount of effort into it for for anyone else so unfortunately we ran out of time the production cycle has been so long but there were a few moments in the recording with us where chomsky was chuckling and it meant so much to us to kind of capture that mental state so we didn't have time to cut them back into the synthesized version but here are a few clips of chomsky chuckling in response to us well that sounds lounge literacy question to me what is a particle maybe somebody will notice but the seems seems to me like a deep woman she never carried them out the ton of remaining details it's not
Info
Channel: Machine Learning Street Talk
Views: 463,595
Rating: undefined out of 5
Keywords:
Id: axuGfh4UR9Q
Channel Id: undefined
Length: 216min 54sec (13014 seconds)
Published: Sat Jul 09 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.