WE MUST ADD STRUCTURE TO DEEP LEARNING BECAUSE...

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so Paul where um where did you come from are you are you nearby Tim uh for the last two weeks I've been in London oh yeah tell Keith the story so um they raised some seed captor about 18 months ago and then they set a date in February did you say yeah so what the deal is I think sometime in 2022 George and then co-found and Co his co-founder Taylor they were you know George quit Tesla because he thought that the way way that they were proposing to fix autopilot in particular to fix the quote unquote San Francisco problem for autopilot was just way more data for San Francisco specifically and just you know throwing compute at the wall and so he was like this so George's thought was like this can't be the right way to do it so he was like I'm not exactly sure how to do this but I want to take a bit I want to take a position against scale it kind of feels like they're trying to memorize Infinity that's correct so um so would you say that your experience there in formed some of this skepticism yeah I mean I think if you look at uh the AI day presentations that effectively go into deep technical detail about how Tesla learns the world uh or Andre's talk at neps two years ago um you know Tesla very much is trying to come up with a function that memorizes the world um you know basically take all of uh the possible perceptual states that exist in the realm of driving uh and produce the correct perceptual signals uh that result from that um under all circumstances and they pouring massive amounts of training data and making that happen running you know tests in Shadow mode to collect uh triggers that then feed more labels uh back to the labeling team to run that data loop um and you know Operation Vacation which has been publicly discussed is effectively that Singularity moment where the function converges and and has memorized Infinity uh and you know as you know I've kind of talked to you about uh so far I do not believe that that's possible uh you know obviously neural networks are finite machines they operate under finite compute constraints with finite memory it may be possible to build a neural network uh that could indeed memorize Infinity given an infinite number of layers and infinite number of you know uh fully connected neurons um infinite compute and memory to execute that model um but given the compute constraints in a Tesla vehicle uh I don't think that's possible to achieve and so then he shopped around for a year or so to try and find a venture capitalist to fund a sort of research oriented venture to you know make a case for you can do a lot of things without scale and then I came on to the company about 6 months ago and within about a month and a half we were like okay I we don't know what we'll be doing in six months but we need to be selling it to the to the investors in 6 months and so we finished the paper on February 1st or really February 2nd for the North American time zone because it's you know a February 1st anywhere on Earth kind of thing and then a week and a half later we were meeting with uh the Venture capitalists to get money to do this whole thing yeah quite a whirlwind I mean yes I I'm I'm just thankful that uh you know George created this and it exists because because I think um I take the same line conceptually which is the answer isn't just you know more of the same more of the same we you know we need to uh we need to go back to some cleverness and uh and invent some new things and uh to really to really make you know order of magnitude improvements do you should see this setup we've got you on a big 55in plasma screen so it's as if you're in the room with us I'm sorry about that like you know that's too much dger is what you're saying I would have shaved or like you know um like brush my hair or something you did you did Pat it down in the back the headset yeah it's like this headset hair you get you know a headset so I'm I'm actually get to worried about that like what's the compression going to do to my ear cartilage like time all these unknown unknown cybernetic effects a very soft version of cauliflower ear for podcasters right right I guess something you could be proud of right just like the MMA guys are proud of the cauliflower ear I've had this headset on for so many hours 10,000 hours of Skyrim alone I have achieved true Mastery yes okay well we're in we are we're in so um where where do we start I mean maybe um Paul because Keith hasn't watched the recording of of the other day so maybe if you just frame up what's going on here and in particular I think we want to have the the touring machine discussion um with you with with with Keith on the call um so yeah I mean just just frame up the the the you know what's this all about okay what the hell is this all about um depends on which side of the collaboration that results in the paper you are on if you come from the Deep Mind side on this you are looking for a generation generalization excuse me of geometric deep learning right there are some obvious limitations to geometric deep learning which is that the kind of Transformations with which you can or relative to which you can ask for a neural network to be equivariant are invertible and always composable and so on their side they're thinking okay we need to generalize away from invertibility first and then generalize away from the presumption that all transformations can be composed right because not all computations can be composed so if you want to use something like gdl to align neural network architecture to algorithmic challenges you need to figure out a way such that you can structure you can align it to these not necessarily invertible not necessarily composable steps of computation right if you come from it from where I come from it it is oh I'm interested in neural networks as a particular instance of morphisms in a two category of objects parametric maps and reparameterizations and I'm interested in the kind of structure that they're interested in but thinking about it as you know algebras for monads or coalgebras for comonads stuff like this right so this sort of like not I'm not going to say like higher level of abstraction but much more sort of two categorical flavor whereas they're really thinking in terms of sort of representation Theory and a sort of generalized version of representation Theory Bruno and I you know Bruno first right I learned what I know about neural networks through reading Bruno's thesis and that collaboration's been really instructive but you know we've really been thinking about this as okay there's a two categorical story here we want to do two categorical universal algebra and this gets even a gets you way more abstract which makes it more fun to think about but it also makes a lot of things tractable that are otherwise far too complicated to reason about I think uh just just as a very tiny kind of kind of question I think it'd be good I I think most of our our listeners can can definitely understand that some computations are not invertible yeah you know that that that's easy I mean you know you you add two numbers together it's like you don't know what the original two numbers are they they could be a bunch of possibilities it's not invertible um but can you give us an example and some intuition be behind computations that are not composable oh yeah so this is this is sort of like what in some sense this is what type is for right consider some like a computation that requires that the input be a tree cannot be at least not navely applied to something that is lists right you need to do a conversion between lists and trees before you can apply a computation that takes in trees to it right so that would be an example of a non Paradigm like branching is not composable right like if I have an if statement that kind of says you know if it's one thing do this if it's something else do that and that that poses some issues for composition for function composition right I wouldn't say that uh the reason I wouldn't say that is because the function if this do this if this do this right I can still compose stuff afterwards it depends on if my if do if this has a different type say then if this do that then you might not be able to compose like stuff together naively but I wouldn't say that conditionals and and branching are really what the issue is it really has to do with uh input and output types right well I think I think the concern there it is a type one kind of at least from and let me first state up front uh I'm super interested in category Theory I know very little about it I only have you know I've tried to do some tutorials online and so a lot of my knowledge is pretty far out of date but I think like the issue with composability of branching at least like say in the hll you know kind of functional programming world is that um if even if both of the results are of type T the the outcome of that branch is is possibly a union of it's it's an it's an optional or a maybe of those of those types like it may be one value or another value so it kind of creates this is that right well I mean so but the thing is like maybe something is still just one type right so I I Really Right but it's not t but it's not type T it's a it's no exactly you have to it's a higher order type yeah yeah it's a it's another type exactly and so you can't simply plug maybe T into like you can't simply apply something that requires you takes input of type T to something whose output was maybe T right you need to do like oh I have to check is like if is it is this just t or is this like you know else or whatever I can't remember what the hasal syntax is right for the excep so that's the problem with branches is as soon as you throw in branches into code you start to you start to change the type into these you know composite okay so I'll I'll yeah in that sense I will bu I will buy that in that composition you have to be much more careful about type so I think it's still a story about input and output types and which things you can compose but the problem is naive composition if you start doing branching of things you realize oh I've actually changed what type I'm dealing with yeah and so just just back to my own like personal story a second because I I have a question like I'd like to ask you which is uh a long time ago you know I I learned about the power of algebra so when I first took an abstract algebra course I was like wow this is really cool you know algebra is so much more right than I thought it was um then sometime later I found out just the the real beauty of of type Theory and this is within a computer science perspective so algebraic data type Theory types you know I just I fell in love right with types okay and so then Along Comes somehow I hear about category Theory okay and I'm like oh yeah this seems to be you know I if I go learn some about category Theory it's going to help me understand software better and you know algebraic type Theory and everything but what I sort of quickly found and and again this could be quite naive and this is why I want to ask you is that it seems to me like type theory is you know it's it's this higher order thing where for example you might ask for you might look for similarities between algebras you know like here's the algebra of of Boolean algebra here's the algebra of partial orders and and they're actually maybe the same algebra or they have a lot in common and so there's this beautiful sort of high-order um uh pattern finding and relationships that that allows you you might be able to prove something in one algebra and it gives you the tools to map it like into a different algebra but I found like it didn't really have a lot of relevance to me just for my everyday software engineering and so I started to kind of lose you know the interest in it like tell me where I'm going wrong like tell me you know in what ways category Theory can can help someone who's more of a a a practitioner a computer scientist a software engineer you know really um like understand better what they're doing or or apply insights you know to the kind of work that we would do so unfortunately I am not the best person to answer that question but I'll give it a good shot anyway right you know so I come to this whole story from pure mathematics and I discover type systems in the context of homotopy type Theory where he like what there's a programming language that corresponds to space okay right so my my my introduction to this stuff like I come to fun like I actually like you know way back in high school or whatever I learned scheme and stuff like that but that was you know longer ago than I would like to admit and had that hadn't been like terribly relevant to my intellectual life in recent years but um so I would say the problem with trying to justify category theory in terms of functional programming alone is that you've already learned most of the category Theory that's going to be directly relevant by doing the functional programming itself right the point is you don't get Haskell without category Theory but once you learned Haskell you've learned a lot about a specific category and that's the only one in which you're actually doing all that much work so the abstractions of category Theory don't necessarily give you all of that all that much right that's sort of my naive perspective on like you know it's like oh category you know I remember the first day I came into the symbolica office when we were next door to Coastal Ventures and I come in and you know like one person's there you know and they're like oh you're the new category theorist uh I hear that's useless and you know my retort is like you like functional programming right and they're like yeah and I'm like that wouldn't exist in its current formulation without category Theory it's like yeah but I know it through all of these other means it's like yes you've you've gred the essence of the category theory that is applied to functional programming most of the time already I do think that once you get into things like dependent types and sort of more sophisticated type systems the category theory side you need more sophistication to make really good sense of it you can have sort of a a pretty good um Elementary in the sense of like I think about elements or I think about terms way of thinking about things like dependent type systems but um I found you know of course this is how I came to this stuff so of course I think this is like the right perspective but I think that category Theory gives you permission to have a sort of more geometric interpretation of this stuff that makes it easier to understand and easier to reason about right you know one thing is like the real strength of category theory is not like working in any particular category right the real strength of category theory is hey this gives me this principled language and theory for abstracting things such that I can remove all the extraneous detail from a particular problem and study just the bare bones of that thing get an excellent description an excellent understanding and find it in lots of different places right so it's the idea it's you know like I want a good Library I do it once and then I apply it in lots of different contexts and I'm absolutely not want to category Theory as as useless people I think I'm I'm in the category of I think there's probably a lot of really useful stuff in there I just either I don't like I definitely don't understand it Andor it hasn't been discovered yet um you know for for my particular context but let me give you an example where yeah my intuition tells me there's probably some really useful insights from category Theory to be explored so um long ago okay uh Alexander stenov a really brilliant you know uh computer scientists was trying to to do things within type systems um that were just not possible and in the current languages so one thing he wanted to be able to do for example was to have a type system Rich enough that could Express properties of an algorithm so maybe you have a bunch of different sword algorithms in your library and he wanted the kind of complexity constraints of those sword algorithms to be part of the type system yes so that somebody writing a generic piece of code you know calling a sort algorithm and that there were certain properties required there that the type system could make the best decision like it could say well in this particular case you should use this sword algorithm and it was able to compute all this like from the type system yeah um he eventually discovered C++ which just by accident okay just by accident it had a template metaprogramming capability that wasn't by by design it just kind of accidentally ended up being a turing complete you know type system or whatever where he was able to implement some of his ideas with really ugly code by the way so I'm not justifying you know the quality of it it just he just was able to and he was able to do you know lots of brilliant things which led to like ideas like you know traits and categories in their sense like in the kind of and you know algorithms with with very rich type systems template meta programming I feel like we've hacked our way to that like by accident kind of from the from the computer science side from the programming side of things and that category Theory could offer a new lens to view that um and and provide insights for Designing like the next generation of of languages and you know Advanced type systems um is there any sense to that or or or I'm I'm off the no I think that that's so one of the things that like I'm not working on this kind of stuff right now but I have spent a lot of time thinking about and is essentially starting with a particular category and then trying to define a domain specific language for that or like an internal language of that such that it becomes easy to reason about the stuff in that category in a way that's perhaps a little bit more intuitive than going towards like the high structuralism of purely category theoretic arguments right an example so I love that let's let's pause there for one second so I love this idea so category Theory can help one design domain specific languages yes no this has been like become a huge deal like as you know I only know of it from like the homotopy type Theory community and the kinds of projects that that has spawned but as an instance of this you know so there's this paper I think by Mike Schulman called a pract a called a practical type theory for symmetric monoidal categories and what he does in this paper is say okay there are a bunch of type systems that are interpreted into symmetric monoidal categories but no one uses them for proving things about symmetric monal categories and he asked the question why and he essentially you know nails down a couple of points that all of the existing type theories that are interpreted into symmetric monometal categories fail to satisfy and like the one that I found the most compelling is like they don't have this sets with elements flavor to them right and what's prerequisite for sets with elements and and that feeling is well you need tupal of terms to be in tupes of types and that needs to be symmetric on both sides of judgment right and so he says okay I'm going to design a new type system that has that property with my intended semantics in uh symmetric monal categories so the idea is you can design a programming Lang so you can un like you can have like the semantics that you want this thing to have and then design the programming language to fit that right and I think this is an incredibly powerful uh powerful trick that I do believe it's you know it's taking longer than people thought it would but that's okay you know cuz it's hard but the idea is you know what you can s like has this line about you know the only reason science works is because you can forget how you got there and just start from New Foundations each time right it's kind of like a less snarky analog of the you know science proceeds and funerals kind of thing is like you don't want you don't exactly you don't want to have to recapitulate all of mathematical history to work with something and we're finding out that there's all of these like complicated mathematical structures that we really want to reason about and we don't want to have to build them from scratch instead we would like to start with something like a programming language or a logic that just deals with all the complexity of that Naturally by forcing you to be careful about syntax yeah right and this is the kind and so this is exactly this whole study of like domain specific languages for categories or better in like in mathier language this is like internal language hypotheses or internal language statements like I want the internal language of this flavor of category or the internal language of this flavor of category and so maybe like so this is you know you can ask the questions like why the hell are we getting to why why why now right there's this there a good deal of why out right I've just said is like hey we've just had a successful funding round we've got $31 million we're assembling like the world's largest industrial category Theory and machine learning team the immediate response to this is like aren't you the only like uh category Theory and machine learning team in the world is like no there's probably like a couple of them there's like petar's team at Deep Mind that kind of stuff so we're not the only people in the world doing this but will probably over the next like 6 months become the biggest and the question is like how the hell do you get money for this and like one of the reasons is you know if you look in the history of computation there have been people banging the drum of like purely functional perspectives and category Theory as like the right foundations for computing but you haven't needed it before right you've been able to hack your way with other paradigms but deep learning and Quantum both pose problems to the existing paradigms it's just like it suddenly becomes too hard to make progress without categorically informed treatments of the those computational substrates right and so let me let me ask you about this because I and and and open I haven't read the paper yet um so apologies for that but I think if I'm going to take a wild stab based on what you everything you said so far about where you might be going from this perspective is it the case that um category Theory will inform the construction of domain specific languages for machine learning that is definitely one outcome that I foresee okay and so and so in my kind of let's say uh lay more lay understanding of that it would be like say as a practitioner I would now have a language that I could use to maybe even uh you know Define um modules machine learning modules or or constructs compositions of you know neural networks in a way that I can actually reason about as a human being because it'll be a new set of Concepts a new language that I can program in if you will which will ensure the semantics and structure that that I need to to do something useful is that right that is exactly right one of the outcomes of this re research program program we hope will be something we can call like type driven ml or type driven deep learning yeah I mean on on that we had a very interesting discussion just before we hit the record button and there's a real problem now with ML engineering but also just large scale ml systems like Facebook and it's inscrutability and this is this is a real problem right because we tend to just blame the system and it it gives gainful employment to lots of Engineers because we lie we say that these things reason that they that they operate in a in a rational way and they don't and when they go wrong we just blame the system we just oh you know it's it's just a complicated neuron Network that it's like Al both in how we build the systems and how we explain the failure modes and what you're talking about is a future towards not only having a formal way of understanding and verifying programs but just being on a more solid footing for how we build systems yeah no precisely like this like our aim is very much to refound deep learning in such a way that you can actually [Music] do careful reasoning about it right that you can actually do theorem proving about the behavior of these neural systems right whereas currently we're hacking everything together and we you know we use benchmarks and heuristics and all of this stuff like we can do better y right we know that we can do better with classical programming so why can't we do better with neural networks you know lots of people say oh you simply can't do that maybe software engineering has to be like neuroscience and I think it's like well there's definitely going to be something like that there but a lot of it is formally structured so that the the key will be to evince that which is formally structured and pull as much out of that as we can and make that explicit so that we know the bounds over which this thing is doing some like inscrutable curve fitting whereas the other in other places it's doing this very formal stuff so I mean that that point is really interesting that um people have said to me that they think the future of software engineering is like neuroscience and what I mean by that is we don't know how the brain works we stick probes in and we try and kind of like you know figure out what's going on a bit like the Blind Men and the Elephant by having all of these different views on this inscrutable mountain and um it's it's just a little bit weird isn't it right so you know we're building these multi-agent systems and they have all these complex Dynamics and we try to apply engineering techniques by doing monitoring and alerting and having these like weird thresholds and just fixing problems as as as they show up and a lot of people just say well this is the way things are this is the way software is going to go and as I understand software engineering because Keith and I have have written a whole bunch of um complex software recently and part of the reason for having abstraction is to create a cognitive interface so it's not necessarily about how the software runs because it gets compiled into like you know into lowlevel code native code on on the machine but it's about creating an abstraction to help understand and communicate what's going on and also do like you know forms of verification later yeah well well and that abstraction is a domain specific language and so every every programmer when they're you know when they're creating whatever just you know uh a bunch of functions a bunch of modules a bunch of classes like you know whether you know it or not you're designing a domain specific language and and we kind of just do it intuitively without much like sort of a math helping us and and I think the point is like if I understand both of you correctly machine learning and Quantum you know Quantum Community it's just beyond our capability to do that like you know kind of just intuitively we need we need math like do it's just too hard to hack it together anymore right right yeah yeah so I think that's beautiful I mean and and I um in fact there's a there's a very old um programming book from like the 1970s um I think it's called like programming languages or designing programming languages or something which tries to make this point too and it it makes it makes two interesting points it says you know when you're writing a program whether or not you know it you're creating a domain specific language and so knowing something about programming language design is helpful even to just writing code so you said there were two there or at least one thing you hope comes out of this project um Paul is is uh um Machinery to create domain specific langu angues for for machine learning what are the other things that you hope comes out of it the other things that I hope come out of it are completely novel architectures yeah so completely no novel architectures that can do kinds of things that we currently don't know how to do right so one thing that you might want to do is like well you know so I'm in Industry now I'm not an academic anymore right suddenly I have to figure out how to eventually generate Revenue with this foundational research say what's one of the like so consider Microsoft for example right and this is you know a story that we've been we've been telling for some time is consider Microsoft and all of the different places that they are deploying AI systems or llm specifically throughout their whole you know panoply of offerings the only one that's actually making any money is co-pilot right but co-pilot is not actually very good right llms are not actually very good at generating code right you know they'll they'll like regurgitate some Snippets but like a staggering portion of the time they're wrong the fact that we're used to this is because like when humans generate Snippets Snippets of code for a staggering proportion of time they're wrong so we're not offended by this right but so if you could make something better such that if you say have an llm that you can interact with in natural language but that actually emulates a type system right you will end up with an llm capable of doing higher order processing higher order organization you'll note that I'm intentionally shying away from using the word reasoning because I I sort of take exception to this language I understand it to be the lingua Franco of of the practice but I don't really like the language of reasoning but I'll accept we take exception to it as well oh we very much do but can we can we just Linger on this book this is a fascinating point now um there are a lot of people out there drinking the Kool-Aid right now so there was the Codex model which is what is B into vs code and that's garbage um gp4 turbo is pretty damn good uh and ropic Opus is pretty damn good there's this thing called Devon which came out recently and people are are saying oh my God there's no limit you know we're going to start um getting this thing to generate code and then we're going to fine-tune the base model on the outputs of the code and it's going to be a bit like Evolution it's just going to complexify and complexify and we're going to memorize all of the types of reasoning and all of the types of things we're ever going to see and I I think that a lot of the people who are drinking the Koolaid about this just don't do serious software engineering and and that's because like for example I was playing with Opus you know Claud free last night I got it to generate a manim animation of a bunch of mathematical you know Graphics yeah and it worked really well I I thought floody hell it just worked first time I then said um you know can can you modify it a bit and can you add this thing very quickly it diverged and and it just it turned into garbage and it was useless and that's because it's it's almost like we're wrestling on the boundary of chaos right so the complexity of software blows up so fast it is possible to do something simple but as as soon as you you know you modify it and you change it and you change it you're in no man's land very very quickly and I think part of the reason of having this structured approach you know this abstraction approach is to wrestle with that complexity so you're you're you're making the argument that if you're doing something like a large language model does now but in the abstraction space rather than the low-level syntax space that we do now then we can get further than we are now that is that is precisely the claim there's a I think it's a relatively recent paper by I think the team was led by Tanya Tanya ringer or Talia ringer excuse me I can't remember which university it's based out of but uh it's a great paper and it demonstrates pretty conclusively that Transformers just don't learn recursion right yeah of course there you go and so like why would you ever think that by scaling an architecture that is likely to have fundamental architectural architecturally like by construction limits on the kinds of things it can even figure out yeah right why would you ask a system that you know can't do recursion which we know to be the essence of programming at least in some sense right why would you presume that that model is ever going to be capable of doing the kind of software engineering that we actually need to do in the world I I mean I think it's almost even worse than that because they they can kind of mimic recursion to a fixed depth you know because they have a fixed amount of computation right but even worse than that the the real problem is they can't create abstractions because creating abstractions is it's it's abduction yeah and I think they probably do create abstractions right they've got like latent representations but the problem is like the abstractions are not good they're not principled oh but but I I would argue that we create abstractions and we we embed them in our language and and then they get memorized by by a language model and they can be reused but what we do when when we design software we we we solve this intractable problem it's very mysterious how this happens we select an abstraction from an infinite set an infinite set and then we use that to to represent a concept and language models do not do that no yeah I mean on on the the recursion thing's really interesting because we were having a conversation the other day about um so there's there's this really exciting technology you're working on and as we were just saying it it's a way of building a semantically princip IED um way to design the future of neuron Network software so you're rather than dealing with these inscrutable neural networks where we're actually dealing with compositional structure with clear semantics from from a high level but the problem is as I understand it the the end result is still a neural network and neural networks are finite state automats right so they they they don't they don't do recursion and then there's this notion that came up because we were talking about this over dinner and there is such a thing as a recursive neural network but a recursive neural network is still a finite State autometer the the difference is it can we can modify back propop to recurse over a structured input so for example we can put trees into these things yeah so they can do structural recursion exact exactly so so um we can because a recursive algorithm has a has a kind of stopping condition a base case if you like so we could put a tree in and then the algorithm can recurse that tree and what it's doing every time is is it's just embedding um every sing you know you start with the leaf node and then you go up and you go up until you get to the top and it's it's just incrementally embedding that datum into the neuron network but the neural network itself it's still doing a fixed amount of computation for that input yeah so so I guess the question is what what we're really excited about pet has been working on this for years is algorithmically aligning neuron networks and even gdl more broadly is about constraining neuron networks so that they generalize better within Distribution on on geometric priors um but the thing is they still only generalize within distribution so they they they don't act like touring machines which is to say they they can't they can't um address an expandable memory potential Infinity they can't you know um they can't just go to Costco and get some more memory when they need it they they do a fixed amount of computation per data well so this is you know this particular question is outside of my domain of exact expertise so I want to be like careful about saying you know anyone who's like says oh this person said something wrong it's like yeah I'm not claiming to be an expert in this particular domain but this sounds like a story as like they can do coalgebras however right and coalgebras are things where yeah you can just like keep popping something off you know at infin item you can just like query it hey give me this part of this give me this part of this and there's no stop to how many times I could ask for that right so I can do coalgebraic stuff too but my sense is that maybe this is some sort of more definitional confusion than it is really a statement of the kinds of things that neural architectures can do yes I think I can I can probably help clarify because For Better or Worse we've been having a lot of debates in our Discord you know recently about this topic and so I I think I think maybe we'll test it out here today I've learned how to to communicate this you know a little bit better like to try and explain the the pragmatic you know problems with it um and so if we think about first of all just just to set up something like very clearly um if we all understand what a finite State machine is you know just imagine it as a graph like it can it sits there and it can get input and by the way it can get an unbounded amount of input you know you like for example a soda machine feed instructions as long as you want you can keep pressing the button and putting in quarters in it like you know all day long and you know uh stuff will continue to happen forever the point is that it has a a fixed number of states that are defined like at build time when it was constructed and it just navigates between these states you know always kind of responding responding to the input that's a finite State automata a turning machine is and I say nothing more than but but by the way guys this was a genius you know Discovery like back in the day a turing machine can be boiled down to nothing more than a finite State machine that has an expandable read write memory or another way to think about is it can have two stacks it can push a value onto one stack or another stack it can pop a value off any one of those stacks and push it down so it has this expandable read write memory that's that's unbounded yeah um that machine alone that simple machine alone in fact is powerful enough to perform all computations that all what are called effective computations so any mechanical thing you can construct any you know physical thing that you devise that has like a digital nature and kind of do kind of stuff you know is no more powerful than this machine and that includes by the way Quantum Computing so quantum computers are still just turning machines they can do certain problems faster but they can't fundamentally solve any problem the deterring machine couldn't solve okay so getting to neural networks like it should be immediately obvious to anybody that that a neural network has that finite State machine component that's what the neural network is and now you ask can I hook that up to read write memory right and have that that neural network do something useful the problem we run into is training training the neural network to do something useful with unbounded read write memory like we don't usually do that so all recursive neural networks that are trained today they have a fixed window ahead of time that's known at training time where they do like operations on whatever a context of 100,000 or they they sit there and operate on um you know uh 100 windows or something like that and then you take the answer that you get at that Point like you've got an output signal whenever people try to train them to do unbounded calculations like for example maybe I'm going to have an RNN and one of my whenever one of my bits lights up in the output then I'm going to treat that as as halting and I'm going to take the answer at that point okay so I'm going to run the turning machine during training as soon as bit number 26 lights up I'm going to treat that as the halting bit and then I'm going to take the answer that could be a tur machine the problem is when they try to train machines like that we just can't do it um and so if you think about the space of all possible programs like the space of all possible programs are turning the turning machines okay and we're doing a certain search So today we're we've got the capability to construct all kinds of neural networks and use stochastic gradient descent or whatever we're searching that space of all possible programs and we're finding one that does something useful what we're finding is that that space that it searches is a very small Subspace of all possible programs and it's really those that are that are also reachable by finite state automata so we're not getting out of that boundary up into context free grammars context sensitive grammars recursively enumerable languages so there's this huge other space that we just don't know how to effectively search efficiently with machine driven techniques yet yeah I mean I will confess I could be wrong but I don't think that that is I think that's sort of an accident of particular definitions and the way that we've been structuring neural architectures right and what I mean by aspects of definitions it's like yeah if you pretend that your thing is of like fixed depth but if you allow feedback into the system then suddenly you can get I do think you could get access to that that you know higher order computation well I think it's it's actually just an accident of our of our our optimization procedures it's it's really it's an accident not so much of like the architecture is is fine like you said people do have you know rnns that have feedback and and all this sort of thing but it's an accident of the way we train them so when we when we go to train them nobody current ly trains in the way that I just said where like when I do a a training pass and I go to propagate my gradients I never sit there and run the the neural the RNN until a particular bit lights up and then take the answer I run it for a fixed number of steps and take the answer yeah well because the problem is if you if you try to run it until a bit lights up it may never light up you can uh oh okay so I see see the concer well so what if you consider the H yeah I mean I don't think this is a real problem I think this is a a question of misconceptions maybe some that I have or maybe some that are popular in the uh the ml Community to which I you know I'm like a quasa expert in like CT world and you know like I'm learning you know using CT to you know speed up my process of know developing expertise in ml but I don't think that this is a permanent problem like in the sense of like that just like something about neural computation makes this impossible I don't think that's the case okay right I think in particular you could do this thing where it's like oh if this bit lights up it stops I do think that that's possible right well so try I mean and I I agree with you like it is possible and so there are attempts that like the differentiable neural computers right or some configurations of of neural turning machines the but the it seems like the universe is conspiring against us because the funny thing that happens is when you generalize those neural networks and try to train them in this way it just becomes so much more difficult to train them maybe we should discuss um the so what question so the the thing that pet has been driving ding at is you know this algorithmic alignment of neuron networks and the hypothesis is there are certain fundamental forms of reasoning where you do require this kind of um you know quantification over potentially infinite sets um just Computing the nth digit of pi or just sorting um uh you know like a a set of numbers to to arbitrary length now the the incredible discovery of neuron networks there's so many things in the world you can essentially interpolate within distribution and it just works now I'm a big believer in this system to idea that you know there's there's a lot of nly perception type stuff and then there's the system to where we do need to do this kind of symbolic reasoning is there a space where we need to have touring computation for AI even if you could solve all interesting problems you know to humanity with with a with a big enough turning machine you know like you you construct one the size of the sun and you know it's made out of neutronium and like whatever else and like it can finally do everything we want you can do it with a lot less energy a lot less matter you know if you can if you can use uh programs that that can utilize some unbounded you know input sensitive amount of memory so I think like that's again like it's about this program search thing which is um the best solution in the sense of like ones that are practical energy efficient whatever many problems the best solution lies in this broader space of kind of Turing complete problems it's certainly been the case for human programming like when you know people could just write programs that were fsas like we could have done that we never needed to develop you know Vana machines and kind of like what's sitting on your desk but we did because they were extremely you know useful for solving real problems and and I agree with Paul that there's no theoretical limitation why you can't train a neural network you know to to search the space of Turing programs it's just we haven't figured out how to do that yet and there's some tricks that were missing and I'm just saying all the current architectures do not search that space they search it a small Subspace and there are many interesting problems outside of that space yeah can I um frame it a slightly different way which is there there are two directions here One Direction is what Fran CH says and you've just been alluding to Keith which is that we build a kind of AI which is almost purely a discrete program search and the the other school of thought is we um do some kind of neuros symbolic combination and Josh tananam has been doing a lot of interesting work here with you know basian prr program search and dreamcoder and so on and even things like fund search and qar and tree of thought and stuff like that is a step in this direction which is to say we start with this interpolative continuous Vector space which has been trained on a whole bunch of stuff and then it's still traversable in that space So if we do some kind of search controller on the top I mean of course it's not going to be searching the space of toing machines but it will give us a margin on the top which will find all sorts of interesting things so just doing this qar algorithm where we're in an llm and we search a whole bunch of trajectories and we have some kind of cost function or whatever and that margin might give us a dramatic Improvement but I still think that there's there's a whole space that we're missing that is not traversible through neuron networks let let me just say as well that what's happening right now is the real intelligence of of these GPT models is the collective search process it's the collective intelligence so we're all using it and we are just putting different inputs into it so we're a bit like tree of thought we're a bit like qar all of these people all over the world are just finding these interesting prompts and they're kind of traversing the the input space and they form like a layer of Intelligence on top of the llms and I guess the question we're asking is maybe the first step would be just to replace that human search process with something automatic well me I'm just going to jump in real quick and then let Paul go which is which is a two things again there's no in my mind there's no theoretical reason why so again a tney machine is a finite state automata with a readr memory there's no reason that finite State automata can't be a neural network it's just that we haven't yet figured out how to train neural networks that function in that way at least not for general purpose efficiently Etc and I think like the other as to whether or not like which direction is ultimately going to be the more successful pragmatically I don't know it's a big mystery to me but it seems like you know the efforts of you know Paul and you know and George's company is working towards that route which is like let give us give us a domain specific language that's like in a sense uh many generations higher than dreamcoder it's like a it's like a a domain specific language that lets people program um these these complicated you know modules and machine learning more effectively and I see no reason why that wouldn't also Empower um automated you know programming programming as well yeah broadly I like that story yeah like I I like I think I've been confused when this question has come up because I thought it was the ca thought it was the claim that like we somehow can't do this and I don't know like I might be wrong again but I don't at my current level of understanding see any any like fundamental obstruction to developing neural networks that are you know touring complete or whatever the magical langu magical words are for it right I don't actually see any like essential obstruction we may not have figured out how to do it yet but I don't see why that's impossible well they have to be a neural network with a read write memory yeah and once once they have that so they're augmented you know neural networks and then once we figure out how to train them like kind of Open Season on on every computable problem yeah I mean I think we can agree that it's probably possible in principle and currently not in practice and it is a very very difficult problem I mean we can we can all agree on that um well it's current it's definitely currently not not not nobody's doing it you know that I know of currently I think it is really difficult I think there's some magic tricks that we haven't found yet and they may lie in this direction of you know starting to apply category Theory to to to design um domain specific languages that that have the the semantics that we need yeah I mean another thing was when I when I first read your paper I assumed that there was some kind of I guess I visualized it as a kind of compiler so now it's it's a bit like imagine we had something like Jack and we did to JX what typescript did to JavaScript so you know we add typing to it we add compositional semantics and so on and and what we've done is is we've now created this whole like interface basically to understand how to build the next generation of of neural networks and then the system is a bit like a compiler so it will compile down to neural network modules that have you know gdl constraints or algorithmic constraints and so on and in the future maybe even symbolic um networks that it would just it would just automagically do all of this stuff because what we want to get to is we need to get away from the Alchemy right we need we need this compiler to kind of do the hard work for us so that you know just all of these problems that we're seeing at the moment just are a thing of the past I have a I have a I have just a quick question because like I is it is one proper way to think about this that we are we are pushing a lot of a lot of the work we're pushing a lot of the necessary semantic work to syntax where it can can then be automated like that seems to be really like the story of mathematics too as a whole it's like creating new syntaxes and new sets of rules new domain specific languages that effectively uh systematize and allow to be automated you know the necessary reasoning yeah I think that the the way that the the mathematician might phrase this and I think this is uh what Andrew actually provided a very nice description of is you know you know the history of mathematics has taught us that abstracting a lot of the specificity away and making your problem statement more and more General well you might think would make it harder actually makes it easier right because the point is if there's a lot going on I don't know what to grab on to but if I reduce the things that I'm allowed to grab on to I can think more clearly about them and the allows me to reason carefully about them and here I do mean reason right in the sense of coming up with reasons for things right why does this do this how does this work those kinds of things right so you abstract away the sort of like essentially analytic detail to this Pro to a problem and then suddenly it becomes tractable it becomes conceivable I would say that what really category Theory does is it abstracts the idea of a semantics right that's really what it is right is because you say oh what is what is a category it is a universe of some particular kind of thing you know your objects are the the instances of that kind of thing and then your morphisms are the computations that go from one instance of that kind of thing to another instance of that kind of thing right so it is really a theory of abstract semantics I just a quick question on that like you're you're aren't aren't you allowed to have more than one type of object in a category or or no or does that have a different name uh well so any any so what do you kind what what does one I mean can I have a morph morphism that transfers an object of type T to an object of Type S and then a different morphism that trans objects of the same category right different types same category but different types exactly so the point is like the Notions of object and type are not are not exact they're not exactly the same thing right you know for example if you're thinking in terms of haskal with which was is like the most likely way that someone like listening to this or watching this like this is going to be the most familiar thing is there's this category Hask which is whose objects are all of the possible types that can be formed with Haskell and whose morphisms are all of the possible function all of the possible well typed functions okay right so is there is there a particular name for a category with objects of more than one type well multip category or something I mean I'm sure that I think it's the kind of thing that once you stare at it long enough the question ceases to make sense I think it I I think it's that kind of thing right whereas it's sort of like yeah it's that it's trying to like mix these metaphors a little bit too rigidly that introduces that I would say is like any given category is like the universe of a particular kind of thing right and then if you want to talk about categories where you've got multiple different kinds of things those you're actually talking about working in the category of categories and talking about fun and you know natural Transformations between functors right you're talking about relations between this the the universe of things of this kind and the universe of things of this kind and you know translations from the universe of things of this kind into things of this kind so I'd say for any given category you've got one kind of thing but the point is that doesn't mean that category Theory at any given time is only about one kind of thing it's actually about relation structural relationships between the universes of different kinds of things all at once I'm going to have to go back to those those tutorials by the way there there are these online tutorials done by uh Brendan Fong and you know some others um that those were the ones I was trying to go through and uh no those are great and it's fun because um you you know it's really enjoyable when you come across a topic that that blows your mind and and the way like I I kind of quantify a mindblowing thing is is if I understand it the day that I learn it and then somehow the next day totally no longer understand it it's kind of like it's kind of mind-blowing um so here's a here's just kind of a funny question for you and answer this any way you want but uh which do you think is a more mind-blowing math um category Theory or number Theory oh category Theory um the thing is number Theory I don't understand it well enough to have my mind blown right I don't know what the weird results of it this this better I don't know what what the the weird results coming from number theory mean I have no way to interpret them it's like what there's a number that does that right this particular number field is like you know has some weird you know abstract property relative to another one I don't I don't even know what those things mean right you know because I'm not a number theorist I haven't spent forever thinking about this but you know I have at this point spent you know like at least 10 years thinking about categories and so I found it to be a lot more mindblowing because to have your mind blown you have to understand what it means right and I do like at least whether it is my own sensibility or essential to the discipline I won't say but I kind of think that category theory is more mind-blowing otherwise I'd be otherwise I'd be a number theorist but um you know no I like that in order to have your mind blown you have to first understand so it can be uh blown away yeah because otherwise it's just complete mystery and it's like mystery is not interesting right un mystery that has the flavor of I can understand this that's interesting otherwise it's just like huh that's weird and I move on with my day got right it's got to it's got to trigger the sense of like oh this something weird happened and I can make sense of it I can make sense of it I haven't made sense of it but I can right I have to have that taste otherwise it's boring that makes sense that makes sense so one other thing um that that confused me a little bit about the the category paper is um you were talking about being able to for example um create an analytical pathway if you like between um let's say a list or or a tree and I I I'm understanding the work as being about um data rather than algorithms and I I think the way I I understood it was again I was kind of brought up on peta's work with algorithmic alignment so he he's thinking about um you know getting neuron networks to behave as if they are doing sorting or or doing like Dy TR or something like that and I was thinking about you know this this could become a kind of system to toolkit where I need to have these algorithmic Primitives and I can compose them together so do I understand correctly at the moment that it it's almost about um like a um types of data rather than types of algorithm I don't know if those are actually that different right and the reason why is if I have a well-typed function from one data type to another one that's both about algorithm and data right that talks about what you're allowed to do with data is exactly what the algorithm does right so the point is that I don't think that that distinction is Meaningful interesting yeah I kind of I mean I I think I I sort of uh I have a lot of sympathy to that point of view because this goes back to um when I when I first learned about you know algebraic data type Theory and its application to to programming languages I started to really realize yeah you know data and uh I don't know what the right word is but but data and type data and set of Transformations allowed on that data are are almost intertwined into the you know like a s like one isn't really useful without the other it's kind of a weird Duality or something I'm not sure how to phrase yeah no I mean like I I might you know Hazard it some sort of like you know dialectical fix point is like one informs the other and like the notion of type and well-typed function reinforce and Define each other right it's a very much a functionalist kind of kind of thing right you know type theory is the core of like you know a syntactic treatment of a synthetic mathematics right at least that's you know the the weird you know not fuzzy but like rarified and unbelievably abstract world that you know through which I discovered type systems right but what yeah maybe maybe another way to think about this maybe this is related to um get your opinion on this Paul that may help Tim for us to think about is you know for many uh let's suppose you have and I'm I'm just going to say class I'm not trying to be like objectoriented or something but suppose you have a type you know that's that's interesting like whatever complex numbers you know you're you're doing a lot of programming complex numbers are useful yet there are many representations of that complex number you know you could like literally store it as a piece of like literal data that was an angle and a magnitude or two you know XY coordinates or whatever there's many possible data representations of it and it's just irrelevant it's like you choose one based on like you know whatever works whatever's efficient whatever I feel like you know whereas all the all the important and interesting stuff is isn't is at a higher level than that I I would Hazard the I I would be uncomfortable with this the statement that it's irrelevant right and in fact so here's one of the like so you know maybe this is not what happens if you're like in uate school for mathematics in places where category Theory doesn't have this sort of like you know somewhat deprecated history using deprecated in this kind of like a slang W like oh this is kind of like not approved kind of mathematics right you know North American math like specifically American math academy yeah exactly it's heretical or like people don't like it so much like why it's like well because it doesn't feel like you're doing anything right why do you do mathematics what is the pleasure is like I want to feel like I'm doing the computations and category theory has this nasty habit of telling you that you didn't do anything right but it's like that scene in Goodwill Hunting where you know they up there there's some circles and lines drawn on the board and he goes up and erases one and draws a line somewhere else and they're like oh you know it's like you know well what yeah I mean the thing that it's sort of about category theory is actually you know grth and had this statement that was like if it's not obvious you're not ready to work on the problem yet so and you know he had this whole you know principle of like Theory building for problem solving and one of the aspects of theory building for problem solving is you develop a vast library of equivalent or at least related representations of similar or the same concept and you solve a problem by finding the one in which it's obvious right it's sort of it's a shamanic mathematics it's a mathematics about understanding something and then not really having to do that much right the point is think about this in the right way and it becomes trivial this is very or not trivial or at Le least like clear right you know this is very much right the task of science is to carve nature at the joints right like that that's the thing it's like well then you have a bunch of different ways you could carve this up find the one in which it's easy right so that makes sense I mean like thinking about all the different possible representations of complex numbers as a data representation is still again informative of the algebra that you're you know yeah what are you going to dobly linked yeah like what are you going to do with it that determines which representation you should use yep y so you're telling me the scheme people were right all along that code is data it's just oh absolutely no they were completely right yeah no I what was it uh 10th grade learning scheme um yeah uh that definitely informed my sensibilities a lot more than I realized you know because I went away from doing like math and science for quite a while before coming back to it as an undergrad and as I've gotten deeper into mathematics I was like oh I really did drink this like functional programming cool koola of this like deeply pure 1970s variety a long time ago and whether I understood that that's what I had done I absolutely had but I do think that they were right d like the difference between data and code is is like there isn't one but actually like the difference between data and code pops up in neural networks a lot right you know people like thinking like data is like you know training data is like what is like Trading data is the goddamn program the neural network is the compiler right you are compiling training data into the final state which is going to be the function that you're actually going to use right this is why I think eventually all of the copyright lawsuits against the major AI companies will go through because they are using copyrighted data to to program their machines that's exactly what they're doing it's not learning is it's it's it's programming right it's a weird programming that works by way of like differential like sort of you know like a slightly mutant differential geometry but it is just programming right you are literally programming your models with data right so any distinction between data and code is nonsense that is very interesting some great sound bites from this conversation Tim oh for me for me it was kind of a a compound of little different you know things that I was taught like when I learned that you could exponentiate matrices yes you I was like wow this is weird yeah you know when uh when I yeah yeah oh yeah yeah well that's another cool thing about power series is is when when you finally learn why you know e to the I pi equals minus one you know and you look at the power like that's that's kind of mind-blowing and then uh and then when I when I saw how to construct I think it was piano arithmetic from this kind of algebraic type Theory you know perspective and and things like that just all adds up to the same really fascinating you know I don't know fact about the universe or fact about logic or something yeah um so so we we've been using some terminology like syntax and semantics can you can we just like zoom out a little bit what do those terms mean and how are you using those terms in this context okay so what I mean to say is okay syntax formal Expressions that you do not assume have any meaning but instead you just have rules that allow you to combine that thing so purely formal constructions usually in terms of symbols whether it's graphical or text that's sort of irrelevant but the point is it's purely formal manipulations of symbols right and semantics is when those are interpreted as meaningful right and the kind of two category theory that I've been talking about is like an abstract semantics right whereas this category who with like a single object and these endomorphisms this group this like classifying groupoid of a group thing that is the sort of like that's the categorical Avatar in this case of like the syntax of being able to write g.x and G prime. g.x that kind of thing okay okay so um you know from a Linguistics point of view you know like people understand semantics to me to to be the meaning um the aboutness of of a word so for example I could say wagger and dog and it would point to the same cognitive category which is a dog um but in in this framework just help me understand exactly like what is the semantics that is a good question right because you're asking about semantics in this very particular way about as like what is this thing about and well the syntax allows you to write down constructions or operations which are manif in the semantic category right so the category is the world in which those in which that syntax is interpreted as be it things like elements or be it things like functions or be it things like Transformations yes right so this this so the category that you're so there's the semantic category and then the fact that they're also encoding like syntax as a category itself that ends up you know opening up another sort of can of conceptual worm sort of breaking down like the easy distinction between syntax and semantics but the point is the the story that they were telling and this is sort of instructed and this is sort of very much what Andrew says is okay so you start with syntax you can write that down into the universal category that has that syntax interpreted as morphisms and then you look at functorial semantics of that so that's the interpretation of this Universal category in which those terms have meaning yeah right which means they have no other meaning other than the formal relationships between them that you specified in your syntax and then you take a funter out of that thing into something else and that interprets them so that's this is that's this notion of functorial semantics the story that I'm talking about is I'm not even worried about the ever writing down any individual terms I'm just interested in sort of this two-dimensional graphical picture that allows you to talk about what kind of thing algebra is at all right so for the story that I was interested in this and like the way I come to this is we don't don't even really ever study the elements like I want to get away from that right because the elementary perspective while it's useful and helpful for like the analytic construction of these neural networks it's not that good for proving properties of those of those networks right you know it's pretty easy to like you write an expression that has five to 10 terms suddenly it's very confusing whereas you draw an element like a diagram that has just like five or six you know one or two dimensional pieces and you can derve formal properties about it's a little bit easier okay okay I mean I'm cuz we were saying earlier that um we could use the example of group Theory and I could look at all of the like symmetry preserving transformations of a triangle and the aboutness you know the semantics is the triangle yeah is the transformation the exactly it's about the triangle it's about the actual transformations of the triangle and then the syntax is just these things as elements of the abstract symmetry group that that that makes sense so when I was discussing this with Andrew he said that the reason this separation is useful is because if you think about it it it's like you're talking about you're talking about two two worlds and he was arguing that we want to or we have less representational ambiguity in the world of semantics yes so there's a kind of asymmetry so for example you know using this this um group analogy of of the triangle we we have these high Precision you know we we can we can actually draw a grid of all of the um you know symmetry preserving Transformations and now we've got a very high resolution way of understanding that triangle but I guess the question is though is it is it really true to say that there is less representation or ambiguity in the semantics I guess like the question I'm asking here is um are are these um semantics ordained by God or are they purely a human construct I mean does semantics just mean something to humans I wouldn't say that they are ordained by God but I would say that what you can do is like you can have like the initial semantics of something which is some right so if a category is some like abstract Universe in which stuff does in which things like a universe of a kind of thing and then morphisms between that kind of thing right you can talk about like the initial semantics of something right and this means the one that that like the semantics that has only the properties that are necessitated by the syntax and no other MH right and that is genuinely necessarily less ambiguous than any particular syntactic presentation of something right so I'm not going to be able to come up with two distinct presentations of a group but the point is I can write down things like groups by way of generators and relations right but there may be and usually are many different ways I could write down the same algebraic structure with different syntaxes however there is only one semantics for that thing mhm cool now um so so you've written this paper right if you were to give me a one minute elevator pitch of the paper what would it be title of the paper is categorical deep learning an algebraic theory of architectures uh one thing that like the you know hardcore category theorist amongst us might take objection to is that the words algebraic Theory actually have a formal meaning in category Theory and that's not actually exactly what we mean right and algebraic Theory to a category theorist is like a l Theory it's a it's one of these syntactic categories of like the allowed manipulations and stuff like that whereas what we mean is specifically architecture as being the semantics of the sort of these Universal algebraic Notions MH so that's the title of the paper gotten distracted by this statement that we said algebraic theories maybe that's not the best name but it's a pretty good name why is it a good name because the point is that we want to say that all of the things that have been studied in gdl significantly more examples all of them are in fact instances of algebraic structures be they structure preserving Maps I morphisms between algebras for monads or structure maps of algebras for monads or the appropriate dual constructions say these are sort of these like Universal structures and then we're interpreting them into a two category whose objects are vector spaces whose morphisms are parametric Maps right a good example of the kind of parametric map that pops up in deep learning is a Rel network is a parametric function mhm right because for any given choice of Weights in all of your various layers right you get one function but you don't really want to think about those functions as distinct you want to sort of group them all together and so this is why there's the innovation of this notion of the parametric function and then you want to talk about being able to change those parameters so that requires the introduction of this no notion of reparameterization yeah right so specifically what we develop or we don't develop it in this paper right this perah has been around for some time but what we really do is we say hey if we do two-dimensional universal algebra valued in these categories of objects parametric maps and reparameterizations that allows us to talk at a very high level of abstraction about the kinds of things like the geometric priors of gdl or the structure Maps which are exactly the representations of groups that are like the foundation for gdl it allows you to talk about it in a far more abstract way that more naturally generalizes to algorithmic constraints as opposed to merely sort of conservation law type constraints which is that which groups are really good for okay so let's just talk broadly about abstraction right so what is the purpose of abstraction and in this particular case give us an example of a powerful abstraction that would you know change the way that we reason about neural networks okay what is the purpose of abstraction from a mathematician standpoint if you want to be utilitarian about it uh the purpose is do I want to climb a mountain with a lot of incredibly specific gear or do I want to learn to be a mountain goat right that's really what the purpose of abstraction is the purpose of abstraction is getting away from all of these specifically adapted tools and say okay these are the tools that I'm going to allow myself to use for it suddenly the problem is more General and like you might think that that's going to make it harder but it doesn't it just teaches you no don't hold on to the like individual unique specificities of each instance those are distractions those are not the structural properties right those are the those are those are specifics of specific instances you want to get away from those because those are distractions from how you solve the problem in a way that's you know actually comprehensible okay okay I mean I think that there's an element of that there's a sea of complexity that that we need to tackle but um the mounting go is quite an interesting example because I would say that's actually an example of specificity rather than generalization but but then we get to canalization yeah so that that that's fair right this is a question of mixed metaphors right you know and and getting getting excited about one and getting stuck in it right you know so this is a train of thought that gets a little bit too far down in One Direction but what's what's the point the point of abstraction is to get away from all of this extra baggage that you might think is providing you information but isn't right you might think that like the easiest way to solve a problem about the natural number or about the integers would be to like know a lot about what how two works or how three Works turns out it's way easier to develop the abstract notion of a ring and Prov and develop this like massive library of theorems about Rings right and it's actually by abstracting away any of the details that forces you to think in a principled way yes so I get all of that and um Duality for example is is is a wonderful feature of of category the because you can prove something in one domain name and because you have all these categories set up you have you have the morphisms and so on then that propagates and I guess this was something that we used to do manually in scientific theories you know we would have to kind of prove Duality on a caseby Case basis yeah so the in this is a sense where like category Theory allows you to do it once and then interpret it into various contexts and see it as an instance of the same thing right but coming back to the mountain go example then so because that that's a weird juter position because it's it's both specificity and generalization the the reason I said canalization is because you know we we know how Evolution works you know and how how living things work there's this canalization so like you know this this goat it might seem like it's the you know the the perfectly um well adapted um and specified you know physical form to to work in this situation but actually if you um you know we're talking about compositionality if you decompose it into its component parts those parts are you know they represent the ultimate form of gener yeah I would buy that okay but but then one thing I think about here is um just how how fundamental how Universal these um General components are because you know we we want to build a structured picture of the world yeah out of building blocks which appear Universal and probably they're not maybe some of them are Universal maybe some of them are are not Universal but you know we we we want to have the Lego set for the universe no we definitely want to have the Lego set for the Universe I don't know what the the Lego set for the universe is and I doubt that there's only one but what the last you know 60 maybe 80 years of mathematics have told us now is that category theory is a damn good one right it's a principled theory of abstraction right it says okay how do you def what is a good abstraction for a notion of structure and what are the principles by which you should reason about it right and it becomes this task of hey you know the category theorist sees a problem in some particular domain and says what are the tools that I know that kind of map on to this how can I it tells you which parts of that might actually matter and which parts of it might not right so you develop a categorical abstraction about it and then suddenly it becomes often easier to think about and it becomes easier to prove things about yes and I I think the miracle of human cognition in particular as we were alluding to with Keith earlier is is this I I think abduction is the same as categorization so I think you know we we select these categories we we create that maybe we select them maybe we create them I mean that's an interesting philosophical thing but but we have we have the ability to do that and we're kind of we're carving up the world and it's our way of overcoming the Sea of complexity in in the universe I I would buy that that's that's a a good story is you don't you know it's sort of like you know what happens in neural networks is like do you want trillions and trillions of parameters that allow you to recreate exactly this or do you want a much lower dimensional space of explanations from which you can rederive and constantly be checking against what you're experiencing yes and this is the story of parimon in general so on the one hand you could have an inscrutable large language model which is just a sea of neurons on the other hand maybe there is some kind of decomposition of language model maybe just forget the language model maybe there is a decomposition of reality and no one knows how complex that thing will be right you know maybe it's possible to really decompose it down into like a thousand types or something like that but that must be a better way to go than to have the inscrutable language model I definitely think so and that's definitely the bet we're playing with symbolica okay now there's the question of what we mean by reasoning as well so I think like we intuitively agree I think that in order to reason we have to be working on this kind of decomposed ponus view of of of the universe and the there's a relationship between reasoning and generalization so type to generalization you know as we understand how I how I reason about Chess for example is is we we have this discret categorical model of of the chessboard and and reasoning is is kind of like taking trajectories and and doing various forms of deduction and inference and so on in that discrete set so that's reasoning um is is is reasoning more complicated than that I mean maybe what what do you think reasoning is my definition for what reasoning is is heavily informed by uh what I know from the work of sperber and Mercier right I am really influenced by the idea of that what reasoning is is the coming up with reasons for things reasoning is about explanations right so this is one of the reasons why I sort of bristle against the the sort of common ml use of reasoning right is I'm not entirely comfortable with the term because I think it's a category mistake right I think reasoning is coming up with explanations for things right deduction is really what we're talking about or induction right things like that that's mostly what we're talking about but and so we say inductive reasoning and deductive reasoning and I know that that's sort of popular language but I've sort of fixated on this perhaps idiosyncratic use of the word reasoning that has sort of informed how I think about a lot of the stuff in the last couple of years interesting yes so so um there is a school of thought as you say that reasoning is deduction induction and abduction neural networks definitely do not do deduction I mean if they do do it it's only because they've memorized instances of it um they possibly do do form of um induction and they definitely do not do any abduction which is you know like generating reasonable explanations for things or if they do do it's only because they've memorized human instances of it does it matter if the because this is kind of like you know what Bostrom says about um intelligence and goals are orthogonal and what he's saying is that um goals are Universal and this is a really common thing in symbolic Ai and you know people think that intentionality means that we have goals and these goals goals are fixed so the reason you behave the way you are is because you had a goal and that goal is like a universal thing it was always there and that gu your behavior another school of thought is that intentionality is as if which means like you have agency if um you could construct an explanation that explained your behavior about yourself or or other people but is that a good explanation if if it's just a model rather than being what you actually did um it's a really good explanation if you can make that thing formal right if you can say okay here is my sort of purely syntactic description say of what that reasoning is and then I can have a formal interpretation of that and say actually this did work like that right it's certainly not going to be the only explanation but the point is it is a good one right so that's that's the kind of thing that I find most interesting and what but what makes a good explanation is it is it just predictability or is it intelligibility or is it something else I think intelligibility is hugely important an inscrutable reason is like having no reason at all EX exactly so so we intuitively agree that the reason why language models are it's not so much that H because there is obviously a reason why they do things but we don't understand what the reason is and we ensue it that the reasoning is not robust because it first of all if you just look at the activation pattern of a neural network yeah it's like you perturb the input by a tiny amount and it it's almost like it activates completely differently so so we think that the reason it does stuff is super super brittle whereas we think that the reason we do things is robust and maybe we're just being human chess or whatever but but we kind of think that that kind of reasoning has an elevated status yeah I mean I think it I think it does but I don't think it's elevated in the sense that neural networks can't do it I think it's that existing architecture don't so what would we need to do to a neural network to make it more robust in that way um this is in some sense what the paper is about right so the paper is about how you might talk about algebraically structured neural networks so is this almost like when we deliberately canaly a neural network we are I mean obviously this is this is all under Spectrum but by reducing this kind of chaotic you know random trajectories random activations and so on by by creating a robust activation P pattern which is perhaps more if not completely intelligible then it's going towards reason it yeah I think so yeah so we should do some category Theory 101 because I'm I'm sure that the the audience know nothing about category theory on on on average so in your paper you actually had a really beautiful introduction or a primer to category Theory where where we kind of like go through the concepts one by one can can we do some of that now yeah I'm I'm glad you think I think so um so one story of where you get to categories is the one that pet will have told you you know 20 minutes ago in this in this video right and he talks about it as okay so you start with groups and then first thing you do is that you relax the requirement that all transformations are invertible that they can be undone and then the next thing that you do is you relax the Criterion that all transformations can be composed and that does indeed get you to a notion of category right because a category at least a small one is a set of objects together with a set of morphisms that have a composability Criterion on them which is that the end of one Arrow has to be the beginning of the next one and they have a composition law that is associative meaning if I've got three of them if I compose the first two and then the third or if I compose the first one and then the composition of the SEC of the of the second and the third I get exactly the same thing okay so so that there's this associativity Axiom and there's also an identity axium right yes there is always an identity or endomorphism of every object okay okay so so let's just be clear here so a category it's like a set of objects and morphisms yeah it is a set of objects and then it is a parameterized family of sets of morphisms and it's parameterized by pairs of objects one being the source the other one being the Target or in different language the domain and the co- doain okay so I mean let's just sketch out an example so that in in the context of like gdl what what would that be oh okay let's go do something easier let's look at the category of sets right right so I've got my objects are sets my morphisms are functions of sets not all functions of sets are composable because I can't apply a function that takes in input from a set of three things to a set that has two things right so that's where the composability Criterion comes in okay okay so let's talk about Morphin morphisms how do they work so they're just a this is this is the the the magic of category theory is like the sense of how they work is well they compose in an associative way and there's nothing else to them you've abstracted away any of the specificity of those things being a computation or doing something they're simply there right and you're reasoning about here I'm go using reasoning again but the point is you're thinking and like doing inference reasoning fine I'll just use say reasoning you're reasoning about how you can compose them and what properties you know those compositions have okay so um in in the in the gdl um show yeah there were these these groups and you know there could be the group of triangles um but we're also talking about you know things like um SO3 or things like you know ones that that preserve rotations and angles or there is ones that preserve like translational um equivariance and so on so um and these groups needed to have um like four axioms that that that they um uh you know conformed to yeah right so groups are sets together with a binary operation and a specified element such that you know that specified element the unit of the group operation acts as the identity transformation if you put it in the first argument or in the second argument and that that multiplication is associative okay so so these morphisms it's it's a kind of similar concept right so it's it's a transformation between two things but there has to be some kind of structure preserving characteristic well this is the funny thing is what you're doing is you're abstracting away the structure that it preserves and simply talking about what kinds of things can you do with structure preserving Maps it's like I don't know but I can compose them and so the idea is hey let's just talk about what we can know about mathematics just in terms of the notion of composable morphisms of compos I functions we don't look inside the functions anymore we specifically abdicate that position we say okay I'm going to reason about how I can compose stuff and nothing else and see how far I can get and what's absolutely demented is how far you can get with just that right you might think that no I really need to see what's inside these things is like for a lot of properties of mathematical structures you don't right is if you abstract that away suddenly the same Claims can be made and proved in various different contexts so I mean so just to really spell it out when you say you can compose these morphisms together because you were talking a little bit before about it almost being um an interface bit Lego is a good example you know so so there's there's um a square peg there's a square hole these two things go together so for them to be composable it just means that if there's a compatibility between them you can kind of you know plug them together in any configuration you want or not any configuration right there's usually only compatible exactly there's there's like the way that the ways that you are allowed to right so suppose I have one function f that goes from a set X to a set Y and I've got another function G or yeah that goes from y to zed right I'm allowed to compose first F then G I'm not allowed to compose first G then F MH okay so so we've done we've done the basic category thing um let's go to the next level of abstraction let's talk about things like you know um uh you know funs and endofunctors okay so what is a functor right so it's first we've said a category is you know a class or a collection or whatever usually it's a set a set of objects together with this doubly parameterized family of morphisms that you can compose well there's that's a mathematical structure in and of itself so I can ask what are the structure preserving Maps between categories these are the things that are called functors what does a funter do a funter comprises an assignment of the objects of the first category to the objects of the second category and the morphisms of the first category to the morphisms of category in a way that preserves composition okay so again let's bring this to life because we we were talking earlier with Keith about um you could for example transform a list into a tree or something like that so it it seems like the objects are very different but you're still transforming between them in a way that you know maintains these properties yeah so lists and trees so let's talk about you know lists valued in a particular set right so what what do I mean by lists valued in a particular set I can think about the mono of lists valued in a set X I can also talk about binary trees valued in a set X right and it's still a set function between them but it might preserve some portion of the of the monoid structure of lists and of the algebra for a monad structure of trees mhm so let's let's move up one level of abstraction so let's go to two categories how does that work okay so we've talked about category and now we've introduced this notion of funter as structure preserving Maps between between categories natural Transformations suddenly pop up what the hell are they well first off they're essentially the reason why category theory is invented right this is sort of like you know I think if you asked mlan and the other guy whose name for some reason I can't remember isenberg excuse me if you ask Sammy isenberg and Saunders mlan like why did you invent category Theory the answer is well we needed enough to explain what natural Transformations were right and so what happens so suppose I have two fun I've got some category C over here and I've got some category D over here what a natural transformation is is a compatible family of deorations it's a deoration from one functor into another one right so if I've got C over here and D over here right and I've got one functor F and I've got another functor d a natural transformation is a two-dimensional morphism that goes from one functor to the other one datawise what does it comprise it is for every object of The Source category it is a morphism in the Target category that goes from the image of that object under the first funter to the image of the object under the second functor and it satisfies this Criterion called naturality which is that this deformation is compatible with the morphisms of C interpreted by the functor F and the functor G right so these things called naturality squares if you you know they're sort of you know say suppose I had some function f from A to B in the category C and I had two functors capital F and capital G right the natural transformation has components which are going to be Maps so the form F of a to G of a and those are supposed to be compatible in the sense that a particular square commutes and what is that square well you have F of f goes from F of a to F of B you have G of f goes from G of a to G of B and then you've got these two vertical things this one is the component of this natural transformation at the object a this one is the component of the OB the component of the natural transformation at this object B and the criterian is that this diagram commutes what does that mean that means that if I compose this thing and this thing that is equal as morphisms to the composition of this one and this one let's talk about monads so what what are monads okay so what monads are is a gorgeous abstraction for things like group actions or excuse me not for things like group actions but for the notion of group action right so monads are things that have algebras let's look at uh group action for a moment so suppose I wanted a group action on an object X monads are hard they're the kind of thing that like I used to find them really confusing and now I still struggle to explain them but it's the kind of thing that you're like well you've accepted this long enough it's like what's there's this notion it's like there is no figuring out there is only getting over yes yeah yeah it's very much that kind of thing I mean part of part of the motivation here is Andrew in particular um his lens of analysis as I understand it is is monads that that that's that's his main view on this work I would actually say so it was in writing the paper Bruno and I came to the Deep Mind team with hey let's talk more about monats right they were thinking in terms of let's talk about functorial semantics of syntactic categories right and we said well fortunately if you've got a syntactic category and you're looking at all of the things which are models of that syntax there is a monad whose algebras are exactly the same thing as models of that syntax right what the monad does is it purely semantically bundles up all of the operations right here's a good example of am monad list blank what does list blank do it takes a set x to the set of all lists valued in X what is the extra structure of this monad because what I've just described is just an endofunctor I haven't described the extra structure of a monad a monad has a unit and it also has a multiplication right what is the multiplication of this endofunctor well what if I take list of list of X so what are the elements of the set list of list of X those are lists of lists whose entries are elements of the set X X what's the multiplication I can erase the inside parentheses I can turn any list of lists into a list by forgetting the next nested parentheses that's the multiplication of the list monad what is the unit of the list monad the unit allows you to interpret things of Type X as things of type list X right how do I do that I take the element X and I associate to it the list that just has X in it right and the laws of a monad are compatibility between these two Notions between the multiplication and and the unit what it means to be an algebra for a monad right is to have a structure map that goes from you know the monad operating on your object down to your object what this does is this interprets like this formally inter right so you know the monad is this abstract repres this purely semantic representation of the syntax right so I think about like if I've got some monad T I think about what TX is that's all the terms that I'm allowed to write with X and all the syntax that defines the monad T and then I want to interpret that back into X because that's what it means to have like the structure I mean blah blah blah this is topological that's what it means to have the structure of an algebra for T right what does that mean in the case of say this list monad well the algebras for the list monad are these things called monoids where monoids just have a binary operation that has a unit you don't have invertibility how does that make sense okay let's consider an instance of am monoid with which we are all familiar natural numbers okay so I can take a list of natural numbers and I can interpret it as a natural number what's the right way to do that well I add them right this is the structure map of the natural numbers as a mon as a monoid what if there isn't um a natural way to you know let's say turn a list into a single number I mean for example you could multiply them like why why add them that's just one of the structures you could there are lots of different ways in which a particular set can be endowed with the extra structure of being an algebra for lists there are lots of different Mo monoids on the same set oh I see I see so so you would have one per operation or whatever exactly that that's the point is each like abstract conception of operation gives rise to a different monad yeah who algebras are all of the different structures that can be written in terms of that operation okay but the magic of the theory of the monad is it gets you away from having to write those things down syntactically and instead you can just look at the two-dimensional two-dimensional semantics of this yeah I mean it's so it's almost like a kind of um templating um function I mean yeah it's it's a pattern language for algebra yeah it's exactly exactly and I mean this this brings me to Hask for example so has has a few like um mon boat into it yeah how how does it work in has oh okay so these are some of my favorite monads right so I am not a hcll expert so hopefully none of the FP zealots in the audience will uh crucify me in the comments for getting this wrong but so in hasell a lot of the time what do you do you start with an endofunctor F yeah right and then you want to talk about the monad generated by F right the free monad on F well how do you do do this is like well you sort of like infinitely compose f with itself right why does that give you am monad it's like well because the you know take omega I.E the size of the natural like the SI the infinity which is the size of the natural numbers do F Omega times and then do it one more time that's the same thing as just doing F Omega times so that means F to the Omega composed with f to the Omega it's just F to the Omega right so that's the multiplication of a monad so you can go from the structure of an endofunctor to the structure of a monad in this relatively Elementary Way by believing in Infinities yeah yeah so so what what else what else do we need to cover on canther for the purposes of our paper right what do you really need to know right that's the question you've asked you need to know what a category is you need to know what a funter is you need to know what a natural transformation is you need to know what an endofunctor is an endofunctor that's one of the easiest ones right an endofunctor is just a funter from one category back to itself you need to know what a monad is a monad is not is an endof funter but it's also more it's an endofunctor together with some extra data that extra data is a natural transformation from the composition of that endof funter with itself down to just one copy of that endof funter and it's a natural transformation from the identity endof funter into that endof funter that satisfies some properties right and these are the axioms of a ad okay and why why do we need endof funct I it must seem like a a naive question but why why do we need to have a way of transforming something back to itself this is what Constructors of a type are right so how do I build lists I have a notion of what the empty list is and I have a notion of how to append yes something to it yeah so the endofunctor that begets lists one plus blank like the one so so if I've got them out from one co-product X down to X well what am I doing I'm picking out a special element in the set X and I'm picking out an automorph not an automorphism an endomorphism of the set X I'm picking out zero and I'm picking out successor so I said that was lists is really like making them like interpret the natural numbers yeah right but it's that kind of thing so the point is the endofunctors are interpretations of abstract terms like so that's exactly why like monads are particularly nicely structured yeah you know Notions of abstract terms that where the algebra's for monads interpret those terms into the original object and where can people at home learn about this in more detail um I don't know where the hell did I learn about monads uh I think I learned about monads in honestly I learned about monads from watching people give a bunch of talks about monads and eventually believing in them if you want to actually read something about it you know sure probably the best best thing to do is go on YouTube and look up monad and find you know find someone not from the FP world to tell you what a monad is I'm sure like some of the you know the Catster videos will do a great job of telling you what that is um likewise I'm sure there's some of like the Fong and SPAC videos from their MIT course years ago that'll tell you what it is uh it's I'm sure it's covered in seven sketches and it's definitely covered in all of the sort of pure math oriented texts about uh category Theory like you know the classic you know categories for the working mathematician um or Steve Audi's book which I learned a lot of category Theory from I finally understood the onal Lama by reading that book a couple of times um so it's going to be it's going to be in lots of places essentially just just pick one and fight with it long enough and eventually you'll get it yes now you mentioned L Theory earlier what's yes okay formally a l Theory is a small category which is complete under the formation of finite products okay that's not a terribly useful um description of it right it's the kind of it's excuse me to say that it's not a terribly useful description of it is completely wrong but it's the kind of you know there's like the description of things that's exactly right once you know what it is yes but if you're going to but it's not the right way to learn about it what is so why do you want it to be closed under finite products well because finite products are exactly what allow you to phrase functions that are going from tuples of things to other tupes of things right so say I want to encode the laws of a monoid well there's a binary operation for a monoid so to encode that I need a map from x x x down to X right so what a l theory is right say the L the I'm not going to do it specifically because I'm up if I try and do this right now but what a l theory is is it is a category which encodes the compatibilities and relationships between a bunch of n to M are operations MH right so it is a like it is a syntactic category that has no structure except specified relationships between a bunch of different endm operations mhm cool it's an honor yeah thank you so much yeah that feels pretty good
Info
Channel: Machine Learning Street Talk
Views: 81,491
Rating: undefined out of 5
Keywords:
Id: rie-9AEhYdY
Channel Id: undefined
Length: 109min 10sec (6550 seconds)
Published: Mon Apr 01 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.