#51 Francois Chollet - Intelligence and Generalisation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Great talk. Thanks.

👍︎︎ 2 👤︎︎ u/correspondence 📅︎︎ Apr 17 2021 🗫︎ replies
Captions
past a certain level of complexity every system starts looking like a living organism in order to build a general intelligence you need to be optimizing for generality itself we are surrounded by isomorphisms just like a kaleidoscope it creates a remarkable richness of patterns from a tiny little bit of information generalization is the ability to mine previous experience to make sense of future novel situations generalization describes a knowledge differential it characterizes the ratio between known information and the space of possible future situations to what extent can we analogize the knowledge that we already have into simulacrums that apply widely across experienced space so intelligence which is to say generalization power is literally sensitivity to abstract analysis and that's in fact all there is to it in today's show we are joined by francois charles i have been using the keras library for many years i also read his deep learning with python book which was inspiring and i discovered his racy twitter feed when i worked for microsoft i used to run machine learning seminars and workshops and hackathons i used to travel around the world and i always had a copy of francoise book under my arm it never left my side i used to force everyone to read the first four chapters of that book and of course the chapter on the limitations of deep learning before we did anything francois has a clarity of thought which is unparalleled i think in any other human being on the planet it's really quite incredible indeed even our own dr duggar who normally has no trouble at all finding holes in some of our guests work had this to say while prepping for the show i am working on it it turned out to be a little bit more difficult than i thought charlay is a little bit too reasonable yeah do you like my dagger accent he would enjoy me doing that but anyway charlay is extremely controversial to some people actually but he's not controversial to us our discussion today lies at the intersection of machine learning and reasoning now chalet has made his vision completely clear about what he thinks the future of machine learning is make no mistake what you should take from today's episode is that the future of artificial intelligence is going to be discreet as well as continuous actually the two are going to be enmeshed the future of ai will almost certainly involve a large degree of program synthesis deep learning has its limits you can use deep learning for continuous problems where the data is interpolative and has a learnable manifold and where you have a dense sampling across the entire surface of the manifold between which you need to make predictions for chalet generalization itself is by far the most important feature of intelligence and of developing strong ai he describes a spectrum of generalization starting with for example a chess algorithm where there is no novelty to adapt to whatsoever the task is fixed the machine learning we have today confers some adaptation within a known domain of tasks for example being able to recognize dogs or cats within a variety of different poses and lighting conditions what's not been robustly demonstrated so far is broad generalization adaptation to unknown unknowns within a known but broad domain it's certainly true that we're knocking on the door of this now with gbt3 where the subtask if you like is given at test time although charley would make the argument that the sub-task isn't learned at test time everything that gbt3 knows was learned on the vast amounts of training data that we trained it on the poet algorithm from kenneth stanley that appears to be meta learning tasks as part of the training process which is very very interesting so it's creating new problems and new solutions as part of the training process but broadly speaking in the machine learning space at the moment the task that we are doing is fixed and not generalizable the other thing is that the real world does not have a static distribution we need systems that can adapt dynamically intelligence requires that you adapt to novelty without the help of the engineer who helped you write the system chalet has come up with a formalism of intelligence that balances the task skill the difficulty the knowledge and experience to effectively quantify and normalize an algorithmic information conversion ratio it's the ability to convert experience into future skill that is chalet's measure of intelligence at the end of his measure of intelligence paper francois introduced the arc challenge it became a kaggle competition as well and it introduced a massive diversity of tasks the reason we have a diversity of tasks is for developer aware generalization any model that we have needs to generalize to tasks that the developer was unaware of and charlay thinks that intelligence is specialized it needs to be human-centric or anthropocentric so the kind of priors that you need to solve these intelligence tasks need to represent the kind of prize that us humans have now machine learning algorithms are completely ineffective against the arc challenge because it's so challenging to generalize from a few examples the only solutions that were effective in the arc challenge were program synthesis the manifold hypothesis is that natural data forms lower dimensional manifolds in its embedding space there are both theoretical and experimental reasons to believe this is true if you believe this then the task of a classification algorithm is fundamentally to separate a bunch of tangled manifolds the only way deep learning models can generalize is via interpolation most perception problems in particular according to francois are interpretive neural networks not only have to represent the manifold of the data that they're learning the manifold also needs to be learnable and that's an even tougher constraint gradient descent will not learn data that has challenging discontinuities in its manifold it'll just resort to memorizing the data deep learning allows you to represent complex programs that you couldn't write by hand but on the other side of the coin it also fails to represent very simple programs that you could write by hand discrete programs so there are some problems where deep learning is a great fit and there are other problems where deep learning is a disaster and the reason for that is that they are not interpolative in nature these tend to be algorithmic reasoning problems francois thinks that 99 of software written today with code is not interpolative in nature and therefore it's a bad fit for deep learning the only answer to these problems is discrete program search to use deep learning for these problems requires a lot of data it's hard to train and the representation will be glitchy it'll be brittle neural networks cannot even extrapolate the scalar identity function f of x equals x they can only interpolate given the existence of a smooth manifold in the latent space jan lacoon recently said to alfredo that all high dimensional machine learning is extrapolation so is this similar to interpolation well i mean all of machine learning is similar to interpolation if you want right when you train a linear regression on scalar values you're training a model right you're giving a bunch of pairs x and y you're asking what are the best values of a and b for y equals ax plus b that minimizes the squared error of the prediction of a line to all of the points right that's linear regression that's interpolation all of machine learning is interpolation in a high dimensional space there is essentially no such thing as interpolation everything is extrapolation so imagine you are in in a space of images right so you have a color images 256 by 256 so 200 000 dimensional input space even if you have a million samples you're only covering a tiny portion of the dimensions of that space right those images are in a tiny sliver of of surface among the the space of all possible combinations of values of pixels and so when you show the system a new image it's very unlikely that this image is a linear combination of previous images what you're doing is extrapolation not interpolation okay and then how did you mention all of machine learning is extrapolation which is why it's hard i'm being brave calling out yan lacoon the godfather of deep learning but hear me out it's certainly true that interpolation on the native data domain is useless right we need to pull some useful information out of the data and the model architecture and training method matter a lot here we can all agree that interpolation on the learned manifold would seem like extrapolation in the original space of the data right chalet is quite clear that neural networks only generalize through interpolation you might argue that you can go a tiny step outside of the convex hull of your data even by a tiny little bit and you can technically extrapolate well i would argue that if the manifold doesn't give you any useful information outside of the training range then it wouldn't be any better than finding your nearest training example and just adding a bit of random noise if you train again for example you can interpolate on the latent manifold but interestingly you can extrapolate but the reason for that is the natural manifold that the data of faces sits on might be shaped like a football or a sphere which means if you go outside of the training range you actually have some information about those data points the scalar identity function might seem like a contrived example but it's a really interesting one when you go outside of the training range nothing about the manifold is known right think about the manifold it's just a string that goes on forever we don't know anything about that manifold outside of the training range this is not true for most perceptual problems and deep learning and this is why image models for example suffer greatly drawing straight lines what are your thoughts about this why don't you let us know in the comments section on youtube so there's a real interesting dichotomy of continuous problems versus discrete problems that we're going to be exploring in the show today it's very interesting that brittleness works both ways depending on the discreteness of the problem program synthesis would be extremely brittle in classifying cats versus dogs or even mnist and deep learning would be extremely brittle predicting the digits of pi or prime numbers or sorting a list so britton is here means the overall fit of your model or your program so accuracy and robustness imagine if every single bug you experienced with computer software was entirely unique to you and the development team wouldn't even be able to reproduce it this is what would happen if software was written entirely with neural networks it would be more not less brittle chalet thinks that motivated thinking is the primary obstacle to getting people to wake up to the fact that neural networks are poorly suited to discrete problems the people who are good enough at deep learning to realize its limitations are too invested in its success to say so charles fundamentally thinks that there are two types of thinking type one and type two he thinks that every single thought in our minds is not simply one or the other rather it's a combination of both types type one and type two they are enmeshed together in everything you think and in everything you do even our reasoning is guided by intuition which is interpolative in nature chalet thinks that abstraction is key to generalization and the way we perform abstraction is different in continuous versus discrete space we need to find analogies and those analogies will be found differently in both of those different spaces program search allows us to generalize broadly from just a few examples it marks a significant deviation from traditional machine learning rather than trying to interpolate between the examples you have you're constructing an entire search space from scratch and testing if it fits our training data it all started with the flash fill feature in microsoft excel do you remember that you give a few examples of some transformation that you want to perform and it will generate a piece of programming code for you which means it can generalize that transformation across an entire spreadsheet it's quite a revolutionary idea it's been around for about 20 years actually but what's really making it work now is the idea of using neural networks or a neural engine to guide the discrete program search we spoke about gpt3 he thinks that gpt3 hasn't expanded his knowledge of the world he says that gbt3 is not learning any new algorithms on the fly it's already learned continuous and often glitchy representations of existing tasks during its training it's completely ineffective against his arc challenge tasks people often claim that neural networks are turing complete no they're not a model has a bounded number of nodes and a bounded run time it cannot execute algorithms that require unbounded space or unbounded time for example could you train a neural network to predict the nth digit of pi no you couldn't you could write a computer program to do it but you couldn't train a neural network to do it a simple turing machine program can do just that and that is because a turing machine can access unbounded memory and time the best thing that neural networks can do is approximate unbounded algorithms but doing so will introduce glitches for example one can train a neural network to approximately multiply integers together yet even when learning to multiply fixed-width integers practically sized neural networks introduce errors occasionally and for a fixed size neural network these errors grow more common as the size of the input grows that said neural networks are finite state machines and just as finite state machines can be augmented with unbounded memory and iteration to yield a turing machine neural networks can also be automated in the same way to produce a touring complete computational model if you want to see a concrete example of the kind of discrete program search that charlay is talking about look no further than the recent dream code of paper yannick just made a video about it so yeah it feels like today is the culmination of a year of really hard work and passion from the mlst team we've worked with so many fascinating people we've had so many amazing guests on it really means a lot to us today is a very very special episode it was my dream from the beginning to get chalet on on the show i know that charles is going to say lots of interesting things that will trigger some people and inspire others and please take to the comments section and tell us exactly what you think anyway enjoy the show see you next week peace out welcome back to the machine learning street talk youtube channel and podcast with my two compadres mit phd dr keith duggar and yannick lightspeed culture now today we have a very special guest francois charley francois is one of the few leaders in the machine learning space who's caused a massive stir in my thinking the only other notable one actually being kenneth stanley who we had on recently my ultimate goal with street talk was always to get francois on the show and i can't believe that it's actually happened we actually have a rule by the way that i'm only allowed to invoke francoise name about once per show but that rule will not apply today so yeah yannick and i have made more content on francois chalet actually than anyone else by a wide margin it's because his work is very thought-provoking and disruptive i spent many weeks actually studying his measure of intelligence paper last year and of course his recent europe's workshop was fascinating as well almost every single word in my opinion that comes out of francoise mouth deserves rigorous study and i seriously mean that so um francois thinks that intelligence is embodied it's a process and it's not just a brain he's skeptical of the so-called intelligence explosion and he thinks there's no such thing as general intelligence all intelligence is specialized critically he thinks that generalization the ability to deal with novelty and uncertainty is the most important concept in intelligence he thinks that task specific skills tells you nothing about intelligence he thinks that deep learning only works for problems where the manifold hypothesis applies for example problems which are interpolative in nature and when a sufficiently dense sampling of your distribution is obtained otherwise deep learning cannot generalize deep learning can only memorize but it cannot always generalize and in his recent europe's presentation he introduced the concept of program-centric and value-centric generalization which we'll get into in the show today but i wanted to move straight on to this concept of deep learning being kind of like a hash table because this is what francois thinks so he says that a deep learning model is kind of like a high dimensional curve with some constraints on its structure given by inductive priors and that curve has enough parameters that it could fit almost anything right so if you train your model for long enough it will simply memorize your data and because of sgd your manifold fit is found progressively and at some point the manifold will approximate the natural manifold between underfitting and overfitting and at this point you'll be able to make sense of novel inputs by interpolating on that manifold so the power of the model to generalize is actually a consequence of the structure of the data and the gradual process of sgd according to francois rather than any property of the model itself last week francois we were talking to christian sergey and he takes a rather different view because one school of thought is that deep learning models are kind of like searching for a space of possible programs and advocates of gpt3 make this argument quite strongly and presumably christian sergei he wouldn't be doing what he's doing which is interpolating between mathematical conjectures assuming that interpolation space would actually give us new information about mathematics if he thought that that space wasn't interpolatable what do you think francois right i think you've beautifully summarized it really yes so interpolation is the origin of generalization in deep learning models and that's very much by construction by nature right like a deep learning model is a very large differentiable biometric model trained with gradient descent and so the only way it's ever going to be generalizing is your interpolation it is literally it is this is what it is this is what it does so i think the question you know um are all deep learning models interpreters or not is not a super interesting question because it's not an open question we know they are but the more interesting question i think is what can you actually achieve with the sort of interpolation on this very complex high dimensional manifold and the deep learning models are implementing i will tell you know the properties of this generalization with the the tasks for which it will perform well the torso which will not perform violence i guess one example i could i could give you is encoding data with the fourier transform right you know about the fourier transform and maybe you know some people will play around with it and they will be like hey you know actually the fourier transform can can draw much more than curves look i made a square with it right and then you would have to point out that no actually the square you've made it by superposing lots of tiny curves and it's not in fact a perfect square right because it is made of with the superposition of class of time curves and that really this is true by nature by construction this is where the fourier transform starts right and the re the more interesting question is you know what sort of data uh is a good fit for encoding the free transform and with solar data is not a good fit like if you try to encode the t-square fraction with the fret transform you can have bad time and if you try to include a drawing that's mostly just you know nice uh smooth curves then it's going to be a very very efficient encoding and a good idea and deep learning is very much like that we should ask you know what are strong points whether it's weak points yeah so i i by the way so i don't believe that deep learning models are hashtables per se i usually say they're localities a sensitive hashtables meaning that it can like a hash table with some amount of generalization power because they have some notion of distance between points they're capable of comparing points by um measuring the distance between them right and this this is what would enable at this kind of hash table to actually generalize as opposed to the classic kind of hash table which is just memorizing the data it's very interesting that you allude to the fact that you know what kind of data is the model good for and so on and now deep learning models being essentially like really as as tim said like big interpolators of arbitrary manifolds do you think there is something common across the types of data we choose deep learning for or you know could we in fact use deep learning for most kinds of manifold dish data or do you think there is some kind of specialness about natural signals that makes deep learning very attune to them so i think most things are to some extent interpolative which is why you can actually do lots of things with deep learning marbles doesn't necessarily mean it's always a good idea but it's it's gonna kind of work right you know when people hear the word interpolation they tend to think about linear interpolation that's what uh pops up uh in their mind that's not actually i already play moles are doing right they are interpolating on this very complex very high dimensional uh manifold and this enables very you know arbitrarily complex behavior and in practice it's always possible to an arbitrary a discrete algorithm in a continuous manifold right it's not necessarily a good idea but it's always possible at least in theory so for any program you can imagine you can ask you know is there a deep learning model that will encode some kind of approximation of it and the answer is always yes right uh similar to how you can always encode an arbitrary shape with the fourier transform right but there are if you try to do that actually there are some issues with that so they are very much you know some problems for which chiptune is good fit some problem solving deep learning is not a good fit in the limit the extreme point is a space that is not interpolative at all which is quite rare actually you know most spaces even very discrete kind of spaces do have you know some amounts of interpretativeness um so like but one example would be for instance trying to train a deep learning model to predict uh the next prime number right or or to tell whether a number is a prime number but you cannot actually do that the best you can do is memorize the trained data upon because the space of prime numbers is not uh interpolated at all so your deep training model will always have zero generalization power but that's that's actually quite a quite rare this kind of an extreme case most problems even problems that are by nature discrete algorithmic problems there will be some amount of interpolation that you can do right but that doesn't necessarily mean that it's a good idea to try to to solve you know such problems with deep learning models for deep learning to be a good idea you need a very you need very much the manifold hypothesis to apply so it works best for perception problems any problem that humans can solve via pure intuition or perception is probably a good fit for the plan but any problem where you need you know high level explicit step-by-step reasoning is probably a bad fit for the plan and you know 99 of what today software engineers solve they are writing code is going to be a bad fit for the planet that doesn't mean that there wouldn't be you know theoretically a deep learning model that can embed the same algorithm in a smooth manifold this is always possible to some extent right but there are very significant issues with attempting to do this like just because something is theoretically possible doesn't mean you should actually do it i think we might be not being careful enough when we say what we mean by program because um for example if i take program to be the the universal sense like a program is something that can run on a turing machine for example because of the fact that that type of program actually has access to unbounded time and memory computation it's impossible in the general sense to encode that in any finite neural network like i can write a very short piece of code theoretical turing machine can output you know the nth digit of pi it's impossible to do that with any finite neural network would you agree yeah absolutely absolutely okay because i think that's like a big source of confusion oftentimes with these statements that like you know oh neural networks are turing complete well no they're not that you know if you have a neural turing machine which is a neural network that's the finite state machine piece of a turing machine that can be turing complete but in the general case you know finite neural networks which is what everyone means by neural networks are not turning complete and it actually has practical effects right this is why we see this sort of explosion and the number of parameters to kind of you know start to accomplish yeah absolutely 100 your entirely right so we're only interested in realistic programs like the sort of programs the software engineer would write for instance and we're only interested in realistic neural networks and by the way the constraints that we have on your networks are actually much stronger than asking given this program that i have is there a neural network that could embed it in a continuous manifold the constraint is actually is there a neural network that that could not only represent it but that could learn this embedding on the program from there and this is a several orders of magnitude harder right learnability is a big problem because you're fitting your manifold gradient descent right and if the if the structure you're trying to fit is too discrete with two big discontinuities qualium descent will not work at all and the best you can do is again just memorize the trend there so i can maybe give you a concrete example to kind of ground our discussion here um so in 2015 some some friend of mine so his name is he used keras to do something pretty cool which actually became a cool example on the chaos website he used a lstm model to multiply numbers but not like numbers multiplied by value but the input of the model would be strings like two strings strings of digits and the lstm will actually learn the multiplication algorithm for like multiplying three digits and three digit numbers kind of the the sort of algorithm we would learn in primary school right to do multiplication and remarkably that worked right it works just fine so you can train a deep learning model to learn this algorithm and you could of course train the transformer model to do the same it would actually be probably significantly more efficient but so that works that comes out either with a number of downsides so first in order to train that algorithm which is very simple you're going to need thousands and thousands of examples right of different strategic numbers and once you've trained your algorithm because the actual algorithm was embedded in the neural network it does generalize to never sing before digits right so it's actually it's actually learning the algorithm it's not just learning i'm just not memorizing the data but the thing is because the embedding of an algorithm the embedding of a discrete structure in the continuous space is not the same thing as the original discrete object there are glitches your diploma network unless something you could have found via program synthesis for instance it's not going to be correct one percent of the time it's going to be it's going to be correct you know 95 percent of the time in in much the same way that if you try to encode a very discrete object via the fourier transform it's not going to be correct so it's 100 times it's going to be an approximation and around sharp angles it's actually going to be wrong and very importantly and this is really like the algorithm that you've you know painstakingly embedded into your uh deep learning model via exposure to data does only it does not generalize very well it only does local generalization meaning that if you train it with three to multiply three digit numbers and then you send it a five digit number is it going to work no absolutely not and not only is it not going to work but you could not in fact few short fine tune your algorithm to learn to handle five dg seven digit and so on if you want to fine tune your algorithm you're gonna need thousands maybe millions uh of examples right so it's it's all uh local generalization and lastly it's it's super inefficient like i think we can uh or agree with this that multiplication is is not like it's not um a clever use of an lstm it's you're you're burning tons of resources for something that is actually super easy and you can compare that like since we were talking about pros and cons of deep learning you can compare that to what you could get with the program synthesis engine like i don't want to compare to what you could get with the human written algorithm because kind of the point of deep learning is that it enables you develop programs that you could not otherwise write by hand so the right point of comparison is actually what you could do with deep learning versus what you could do with discrete program synthesis based on discrete search and the dsl and if you were to use uh program synthesis to solve the the the multiplication uh problem so you would find a solution uh even a very engine that has just like uh maybe you know a plus operation maybe a loop and it's dsl it's going to find it it can find it with a handful of examples you're not going to need thousands of examples like in a different case you're going to need maybe five and the program you get out of it it's going to be exact because it is the exact discrete algorithm it is not a continuous ambiguous so it does not have glitches it it outputs the correct answer it will be lightweight so it will be very efficient you know unlocked and like the lscm or transformer model and crucially it's gonna generalize so if you develop it only from three digit numbers maybe there will be something inside it that will hard code the assumption that they're dealing with strategic numbers but even if that's the case you can take it and automatically learn a a generalized form of it if you if you just start giving it to seven digit numbers very easily because it's just modifying in probably a couple lines of code so it is capable of strong generalization so here you start seeing how for a problem that's fundamentally a discrete algorithmic reasoning problem discrete search is the correct answer deep learning it's possible it works but with extremely stark limitations right it's very hard to train it you need tons of data the resulting embedding because it's not it's not discrete we'll have glitches it's not going to work one percent of time it's going to be pretend it's only going to be capable of local generalization right because again like the there is a huge difference in representational flexibility between your very simple discrete algorithm and some kind of a very complex high dimensional company something on it right and and then there's also the efficiency consideration so clearly for if you're dealing and the reverse is also true right like if you're dealing with a problem that's a fundamentally perception problem where you have data points that fit on a nice and smooth manifold then deep learning is actually the right answer and if you tried to to to train a discrete a program to to develop in via via program synthesis an actual algorithm to classify mnist digits for instance everything i i i just said would be true but in reverse your program would be brittle the deep learning model would be robust and so on so there are really problems where deep learning is pretty ideas great fits problems where it's a terrible idea like try sorting a list with a deep learning model can it be done yes actually it can but with all these caveats applying it is possible to sort a list of deep learning with some hacky inductive priors and probably memorizing most of the training data and there's it's not a binary is it you said yourself there's lots of problems that fall in the middle where there is a semi continuous structure and some regularity but it's still a discrete problem and you're saying in that situation we should still use program search but maybe we can use deep learning maybe something about the shape of the manifold even though it's semi-continuous could actually tell us about how to do that program search more efficiently but it seems to me that if there are problems out there let's say adding numbers up in gbt3 when i when i read the stuff that you've been talking about here it seems obvious to me why are people not picking up on this i think you know most people are not necessarily paying a lot of attention to the nature of deep learning why it works why it doesn't work i also think you know the people like there are basically two categories of people they are like lay people and they are people with deep expertise and the big problem we have here is that the people with a lot of expertise are gonna be a lot of the time driven by motivated thinking right because the you know like i do they work in the field of deep learning and so they're going to have this vested interest in deep learning being you know potentially more powerful more general at the entries i think if you want to think clearly the primary obstacle is motivated thinking it's it's fighting against what you want to be true so i tend to have super boring opinions in that sense because i i do my best to try to forget kind of what i would like the world to be in my best interest and try to look at as it really is and that will tend to actually diminish the importance of my own work so yeah but you know i've been doing like deep planning for almost a decade of course i would i would want it to be like this uh incredible world-changing thing that leads to human-level intelligence right off the bat that would be that would be awesome that would be amazing and it would be right uh in the middle of it but that's not that's not actually what's going on you said you tend to be what was the word uh not not controversial ideas or something because you try to stick to the way the world is rather than the way you want the world to be but we we just had yannick produce an interesting video about how if you think that machine learning models essentially attempt to do the same thing right i mean they're not human beings they don't really have wants per se they're just modeling reality as it is it turns out reality itself really annoys a lot of people like they just don't like reality and they don't like the way the world is and they wish it was something different and that infects like every mode of their thinking actually yeah no absolutely most people you know and that's that's true for me as well i i'm honestly not saying i'm an exceptionalist i'm trying uh to do my best to resist this this trend but i have no exception most people have opinions not because they they've seen evidence in support of the opinion but because it's in their interest for this opinion to be true or they just want it to be true i guess one example is you know where we are mentioning gbt3 and so on and proponents of gpt3 i was actually super excited when i initially saw the claim that the pre-trained language model could perform future generalization i thought that that's that's super fascinating i was excited like i'm always super excited if i hear about something that's really challenging uh my my initial kind of mental model of how the world works you know it's like a few years back and there was this claim that an insurance was measured and going faster than the speed of flights i mean that's that's that's exciting right that's like new physics you want it uh to be true at least you want to get to the bottom of it and then it turned out to be a measurement error right so that's that's that's disappointing so i think it should be distributed it's kind of it's kind of the same for me i really wanted it to be something something novel and that would really challenge what i thought to be true about deep learning models and i regret to say that everything i've seen close has actually confirmed in my view that basically deeply models they can learn to embed algorithms given sufficient exposure to data but they cannot really like few shots synthesize novel algorithms that represent a pattern they haven't seen a triangle which is why by the way gpt 3 is entirely ineffective on arc for instance and that's that's kind of sad to me i kind of regret it uh because it means i haven't actually learned uh anything uh from it it hasn't expanded um my view of the world which is which is too bad like i wish it did i wish it did um so yeah so in the case of gpd what's really going on is that the model has been exposed to you know many patterns you could call them algorithms for instance in many different contexts and so it has memorized its patterns and now it's able to take these patterns and apply it and apply them to new data and measure the same way the multiplication algorithm we are talking about because it's an actual algorithm it can process new digits it's not just memorizing the digits in the training it's an actual algorithm in the same way gpd3 contains tons of small algorithms like that but the model is not synthesizing these algorithms on the fly they're in the model already right and if you try to apply gp3 to something for which a new algorithm would would need to be produced like in octaz for instance it is just completely ineffective it seems to all build up what you're saying because there is this strong generalization versus local generalization and then you make a case that in order to do strong generalization we need maybe something like program synthesis approach so like deep learning can't necessarily get us there in most problems and you make an interesting case that something like graph isomorphism search could play a core role in that could you like briefly connect all of these terms together of the case you're you're making there because it's super interesting so going back to it uh tim was saying it's rarely the case that you have problems that are fully interpolative or fully discrete there are definitely such problems in fact most perception problems are almost entirely interpretive and most programs the kind of program that humans right there they're largely like discrete non-interpreted but most tasks actually are best solved via a combination of both and actually believe that's true uh for the way uh humans uh think you know there's type one thinking and type two thinking i i strongly believe that almost every thought you have and everything you do with your mind is not one or the other it's uh it's a combination of both that type one and type two already enmeshed and meshed uh into each other in everything you think and everything you do um like for instance perception it's it's that looks like something like instant so very much the solar flare continuous interpolative thing in fact there's a lot of reasoning that's embedded into into perception and the reverse is true for instance if you look at a mathematician for instance proving a theorem but what they're writing down on the sheet of paper is really step by step discrete reasoning type thing but it is very much guided by high level intuition which is very much interpolated they know where they are going without having to derive the exact sequence of steps to get there so they have this like a high-level kind of view kind of like you know if you're driving you have to make discrete decisions because you are driving on network frauds right but if you have a gps for instance you can kind of see the direction in which you are going which is interpolating if you're talking about direction you're talking about distances you're talking about geometric spaces and everything in the human mind kind of follows this model of type 1 and type 2 thinking at the same time if you go back to first principles intelligence is about abstraction so intelligent ceremonies about the ability to face the future given things you've seen in the past and the way you do that is yeah abstraction you extract from the past some some construct maybe it's a template maybe it's an algorithm that will actually be effective in terms of explaining the future and that's what makes makes it abstract is that it can handle multiple instances of some kind of thing which that thing is an abstraction right and and it's if it's abstract enough it can actually handle instances you've never seen before right it does generalization power and all abstraction is born from analogy like abstraction starts when you make an analogy between two things like you say hey time is like a river if you want to get philosophical or something but in general you can just say this apple it looks similar to this other apple so there is such a thing as the concept of an apple for instance and the path that is shared between the two things that you are relating to each other the subject of the analogy that that's the part that can be said to be abstract that is the part that will help you make sense of the future like you encounter a third apple in the future you know it's an apple because you don't even need to relate this to the airports you've memorized you just need to you just need to read it to the templates the abstract template information that you've formed by from exposure to different kinds of approaches in the past um and if you think about what's what's an analogy really like highly fundamentally it's uh a way to compare two things to each other and there are only really two ways to compare things um you can you can basically ask um how similar are they in terms of distance like you can say implicitly there's you're looking at the space of points there's a distance between two parts that's that's the type one a sub analogy that leads to type 1 abstractions which leads to a type 1 thinking right so a type 1 energy is like your things you say to what degree they're similar to each other so you rate them by distance you so implicitly it means you put your things on in a geometric space right um and the type one abstraction is going to be a template it's like you're gonna have clusters of things so you can take the average and say everything that is within a certain distance of that template belongs to this category that's that's type one attraction it's very much the way deep learning models work and and then you and then you start having perception and intuition on top of that which is very much type one thinking and the other way you can compare two things is the discrete way right that is it you can say these two things are exactly the same they have exactly the same structure or maybe the structure of this thing is a subset of the structure of this bigger thing so discrete topology grounded comparison so you have the geometry grounded comparison it's all about distances and and templates and then you have the topology grounded way of comparing things that's all about exact comparison of finding a subgraph isomorphisms so in in the first case your objects are very much points in geometric spaces so they are vectors and deep learning is always a great fit for this sort of style and in the second case your objects are going to be graphs right and you're and you're really looking at the structure of these graphs and substructure and so on and you're doing always you're doing exact uh comparisons and in in practice um most thinking is actually kind of some some combination of these two atoms right of these two poles uh you're you're very rarely just gonna say yeah this airport is exactly this close to my tablet of an airport so it's an apple you're gonna have basically layers upon layers of thinking and some of them are going to be intuitive some of them are going to be more about you know comparing structures and so on what what you're saying is really interesting right because you invoke the kaleidoscope hypothesis in your paper and the idea there is that a tiny bit of information just like in a kaleidoscope could be represented widely across experienced space so you say that intelligence is literally having some kind of sensitivity to abstract analogies so the intelligence is about being able to face the future unknown future given your past experience and that fundamentally requires the future to share some commonalities with the past and that's that's the the idea of the kaleidoscope of quality is that the universe and our lives are made of lots of repeated atoms of structure and in fact if you look at the source there are very few things that are that are unique that are kind of like the the grains of sand that at the origin of all the different kinds of moving patterns you can see in the kaleidoscope right so the kind of like intrinsic structure contained in universe is very small but it is repeated in all kinds of variants right and um the idea is that if you see two things in the universe that look similar to each other or that share some commonalities a sub graph maybe it fundamentally means that they come from the same thing and that thing is going to be is going to be an abstraction it's going to be one of these grains of sand in your in your collective scope or grains of glass actually and intelligence is all about reverse engineering the universe to get back to this source of intrinsic complexity in the universe to get back to these abstractions i think the heart of this conversation goes back thousands of years because what we're talking about right now is a lot of say platonism right which is that there are these ideal abstract structures and of course they they really thought of them as actually existing in some universe but you know even if they they don't exist in some reality they at least exist in concept and it strikes at the heart of this duality that's always been a very it's been one of the central mystery really of a lot of human thinking which is particle versus wave you know discrete versus continuous abstract versus the real versus the messy and you know i think you pointed out you definitely pointed this out in this call but i think also in some of your papers that in your view you know let's say the ultimate solution or whatever of creating artificial intelligence or synthetic intelligence or whatever is a is a hybrid system that can do both of these types of reasoning maybe in kind of multiple layers and you know i'm kind of curious where is the state of the art now with actually implementing hybrid systems you know something like i don't know is it capsule networks is it the topological neural networks that we talked about where where lies the direction of some type of a hybrid system that in a unified way is capable of doing both of these modes of of reasoning if you will yeah that's a great question so um this is definitely an active field of research but i think the most promising direction right now is going to be discrete search very much so a system that is discrete search centric that has a dsl and so on and that's one of the it's basically just foreign but with it is getting lots of help from the learning models and there are two ways in which you can incorporate this you know type one sort of thinking into a phenomenal type to centralized system so one way is so basically you want to apply apply deep learning to any sort of of data sets where you have an abundance of data and your data is interpolated one example would be being able to use the plain models to generate a sort of like perception dsl that you're discrete uh search process can build upon so let's look at arced arc tasks for instance a human that is looking at artists the very first layer through which they're approaching the arc task is by applying basically perception primitives uh to the grade they're looking at they are not actually analyzing the grid in a in a discrete way like cell by cell object by object they're they're approaching it holistically like what do they see and these outputs can discrete concepts and then you can start you can start applying discrete reasoning to them so generating the dsl and by the way the reason it's possible is because humans have access to tons of visual data and and these different frames share lots of commonalities right so it is it is an interpretative space where declining is relevant intuition perception and the other way which is is much more difficult and much much more self-working is basically being able to provide guidance to the discrete search process basically because even though one single program so learning one sql program for instance one arc task is not a good fit for deploying model at all because you only have a handful of examples to learn from and the the program is super discrete it's not really easily embeddable in this movement however here's the thing the space of all possible programs for instance the space of all possible arc tasks and all possible programs that's so about us is actually very likely going to be interpolative at least to some extent and so you can imagine a deep learning model that has enough experience with with these problems and and the algorithmic solutions that it can it can start providing directions to the search to the discretionary system so um basically you you're you're you're in a kind of like you have yeah you have like layers of um of learning the lowest layer is going to be perceptive it's going to be learned across many different tasks and many different environments it's going to be type type one then you you're gonna have the context specific on the fly problem solving system that's very much going to be type 2 and the reason is going to be possible and efficient is because it's going to be guided by this upper layer which is going to be type 1 which is also going to be trained from a very very long experience across many different problems and tasks and it is able to do interpolation between different tasks so can i um challenge you a little bit maybe because you say maybe you know all of these problems and what humans do is a bit of an interpol like an interpolation between the interpolative systems and the discrete systems and i see that going for you know something like an arc task or or if you really write code but if you really come to let's say let's say the highest levels of human intelligence which to me seems to be navigating social situations which is is is ultimately like is super complex and i can imagine something like the graph structure you're referring to being let's say i come into a room and i see the graphs as you know what kind of social dynamics exist in this room you know this is the father of this person and that person's kind of angry at me and so i need to you know do something and my question is how often is that really a dis like how often can you really map this in a discrete way to another graph isn't isn't every situation going to be a little bit different even in terms of its graph structure and you know even if in an arc task a line is just like a little bit squiggled any program synthesis approach would have a hard time with it i feel or do you think or do you think i'm misunderstanding something here like how how discrete is really discrete that's that's the purpose of abstraction the person of abstraction is to erase the irrelevant differences between different instances of the thing and focus on what on the commonalities that matter right so like if the squiggle in your line is not relevant then the proper abstraction for a line should abstract it away i was going to pick up on that because your main point basically is that program based abstraction is more powerful than geometric based abstraction because topology is robust to small perturbations but it's more than that it comes back to these analogies right so we actually have functions and abstractions in our mind that as you say will take away all of the relevant differences but focus on what's salient and what's generalizable yeah exactly so in in the big sense do you think the type 1 and type 2 reasoning are really different or is there also a continuum between them like you say we need we need hybrid systems but is there something right yeah because they're both they're both in the brain they're both on the same neurons like is there a continuum so right so yes and no i do believe they are they are very qualitatively different these are the the two poles of cognition but there are you know most most things we do with our mind are combination of both that doesn't mean it's it lies somewhere in between it means it's a direct combination of one pole with the other kind of like what i described with with the arc solver with three layers with two layers that are type one and one layer in the middle it's type two but in very much the same way that you can embed discrete programs in a smooth manifold you can also do the reverse and when you meaning you can basically encode an approximation of a geometric space using discrete constructs in fact if you've done any cell of linear algebra on a computer that's exactly what you're doing you're actually manipulating ones and zeros but somehow somehow you're able to have vectors of seemingly continuous numbers you can compute the distance between two vectors and so all of this is an approximation that's actually grounded in discrete programs so you can you can actually kind of merge the two together it's not necessarily always a good idea in particular i think it's often not a good idea to try to embed an overly complex or overly discrete program in in a continuous space as i was mentioning earlier the reverse is actually usually way more tractable and by the way my i think this is something that that came up before in in our conversation but my kind of subjective totally not backed by any evidence opinion of how the brain works is that fundamentally it's doing tar point on type 2 using a discrete system because it's actually much easier to do to do type 1 via an approximation of geometric space it's included in a district structure than it is to do to the derivative yeah and if i can um if i just for the benefit of the reader the listeners if i can give some other examples you know for example in in mixed integer optimization it's often the case that you take that problem and instead of having these discrete values you project it into a continuous space do a continuous optimization and then as you get sort of close to a good optimization you discretize it back over into the discrete variables you know to to kind of you know flesh out the most optimal path within that discrete space or an example two is the gamma function you know which is a continuous generalization of the factorial right and it kind of provides some cool and interesting behavior in between those those poles that show up very clearly on the graph as these discrete points and this is this bizarre duality between the continuous and the discrete that we see like throughout the universe and it's kind of one of the strangest things we have to deal with yeah exactly i just wonder what some of the transformers folks must be saying now because max welling we had him on and folks have um done topological applications using transformers or using graph neural networks and the the alpha fold the thing from deep mind that was looking at graph isomorphisms right it was looking at different types of equivariance in topological space is it a naive thing to say that we could make it continuous or are we on a hiding to nothing right so i guess i guess the question is is there like one approach that's going to end up being universal and it's it's like can you actually scale deep learning to handle arbitrary discrete problems it's kind of it's kind of a question and the answer is uh no actually like by by construction due to the very nature of of what deep learning is it's like parametric continuous parametric models and in fact smooth because they they're differentiable characteristic on descent that is never actually uh going to be a good fit for most discrete programs so and and the reverse is true as well i don't think so you have basically two engines that you can use to learn powers you have quite understand and you have discrete search and i think the reverse is also true that this great search is not going to be this universal approach that's going to beat everything i i truly believe that the ais of the future will be truly hybrid in the sense that they will have these two engines inside them they will be able to require this they will be able to do this quick search right and then and then they will use that appropriately you said by the way in your measure of intelligence paper that there are three types of priors right low level sensory motor priors and meta learning priors that's the interesting one i think that's what intelligence is and high level knowledge and then we get over to the arc challenge and and as you said in your presentation last year the two winning folks on that kaggle challenge one was doing a genetic algorithm over a dsl so doing what you're talking about a kind of program search and actually the the winner who got about 20 accuracy and that was that was just yeah that was um just doing a brute force you know selecting combinations of of operations on this dsl so this absolutely fascinates me so at the moment that seems like a horrific solution but clearly no one could do anything using deep learning so but but this is what you're advocating for so you're saying for these discrete problems get get a dsl now all the stuff you're talking about presumably they haven't done yet you're saying well software engineering the beauty of software engineering is being able to modularize things into building blocks in fact i love citing this thing actually from patrice simhart but he said the reason why software engineering is so good is if i ask you how long will it take you to build the game of tetris you will say not long at all and if you look at the number of state spaces in tetris it's it's huge but the reason you'll be confident to build it in a couple of weeks is because you know that you can modularize it into into blocks you can't say the same for deep learning right but they don't appear to have done that on the arc challenge yet yeah so the the solutions we've seen on the oxygen so far have been incredibly incredibly primitive and so it's it's actually quite interesting that you can get to twenty percent uh it's not it's very primitive solutions i think you can even with today's technology you can go much further like the what i was describing before about learning a dsl that is perceptive and then guiding discrete program search yeah intuition about program space this is already something that you can try today so there's one approach that i was very excited about and that i thought was it's very cool and i really like it's it's called dreamcoder by dr kevin ellis and and folks um so check it out if you if you haven't seen it it's very good i think that they're trying to try to arc now but it's generally like it's this kind of like hybrid deep learning program synthesis uh engine and i think that's really to me that that is the sort of direction that is the most promising today so you have a paper that's fairly long on it's called on the measure of intelligence and you make the case that intelligence is something like the efficiency with which we transform prior information and experience into task solutions as as you have said before and in that same paper the arc challenge is presented so you know a naive reader like me assumes there is some connection between you know what you say about intelligence and solving this arc challenge so mike my question is if tomorrow you know a new team comes and gives you a solution you evaluate it it gets whatever 95 correct it solves the arc challenge is it immediately intelligent or like what would you ask of that system for for you to say yes that's intelligent or it's it's intelligent is is high or something like this so you you you would be able to make that that conclusion if and only if arc was a was a perfect benchmark but it's not it's actually very much flawed so if you solve arc are you are you intelligent um well no because arc is potentially flowed that's that's the thing so the thing you need to to really understand about arc is that it's not kind of the end state of this intelligence benchmark it is very much a work in progress and there will be new iterations especially as we learn more about the flows and by the way so last year we ran a kaggle challenge on arc and we learned a ton not necessarily a ton about program synthesis approaches although there were some cool stuff with cellular automata and so on but mostly we learned about the flowers of arc so there will be future editions and so on so i will tell you this um if you solve the specific test set of arc as it exists today you're not necessarily intelligent because it is not perfect because it has its flaws but if more generally speaking you give me a system that is such that any new arc task i throw at it like i can i can make some new ones tomorrow for instance i give them to your system if it's always solving them i will say yeah it's looking like you've got a system that that's got you know pretty close to human level fluid uh intelligence this is one of the things that um look and i like the paper a lot i think i think it serves as a really good um you know foundation for us to think differently about how to build intelligence but but i have some some issues with it too as well and one of them is this sort of necessity that it requires kind of white box analysis of things in order to figure out whether or not they're intelligent because for example suppose time travel is actually possible and you know somebody like 100 years from now looks back on your arc thing and writes an algorithm that that solves all all them in there because it actually knows about them already and then ships it back into the past and we enter it into the competition and no matter what new arc thing you throw at it it sort of does well and you say well yeah you know this thing's like kind of intelligent but but we'd be wrong because in the sense in the paper it's actually just encoded you know prior knowledge from the future so we have to we always have to kind of be able to look into the box right in order to evaluate intelligence in the way that you define in the paper and so my question is one isn't that a bit of a undesirable feature and two do you have any hopes for a more black box measure of intelligence so basically the fundamental issue is that if intelligence is this conversion ratio then computing it requires knowing where you start from and you don't really have a way around it so the the thing to keep in mind is that the under measure of integration stuff is not so much meant to provide like a solar flux golden measure tape to measure anyone's intelligence or anything intelligence it is uh more meant as a sort of cognitive device to help you think about what the actual challenges are uh to help you kind of kind of reframe ai because they think that they've been pretty deep and long-standing conceptual misunderstandings and that is really being that's been holding the feedback so it's very much meant as a cognitive device um if you if you take a step back and you and you ask why are we even trying to define intelligence and measure intelligence in the first place why why is it useful at all i think it's useful to the extent that it is actionable right a good definition and a good measure should be actionable so meaning it should help you think it should help you find solutions and it should help you make progress in particular a good definition is a definition that will highlight the the key challenges and help you think about it and i think that's that's what the paper does and a good measure is emerging gives you an actionable feedback signal towards building the right uh kind of system right in the sense that it will be capable of doing more and so that part uh the feedback you know is what is what arc is trying to achieve and um the way it's trying to control for priors and experience is by assuming uh a fixed set of priors and and you're going to say you know every every test take yours gonna have this price this is the core knowledge priors they need controls for experience by only giving you a very small number of inputs for example and also by making sure the tasks are sufficiently novel and surprising that you're unlikely to have seen very similar instances before so now of course it's super flawed so this is not 100 true of course but this is kind of like the the platinum ideol that we are trying to get to so that for the record that's a fascinating point to me is that you view this more as a cognitive device to help guide us to produce better better intelligent agents it is not an endpoint it's not like arc is like the measure of intelligence and and now all we need to do is sort of arc this is not at all the point it's like it's one oh darn because i was doing pretty well on some of the examples i was hoping that would mean i was intelligent but another interesting point because keith and i were looking at the paper again yesterday it's been um i've haven't properly studied it since last year but um we were starting to talk about an alien that comes in from outer space and you know we don't know the the prize and the experience and then i was thinking in a way it might be a kind of lower bound on intelligence right because you know if i play chess and if i beat someone with a higher elo than me then only really tells me that i'm better you know as good as that person that i just beat and similarly this measure of intelligence it only gives you a reading in the situation when you know what the conversion was so if they are not converting anything then you don't know and another interesting byproduct of this is the more experienced you get the less intelligent you get so i would um i would push back against that last claim that the the measure of intelligence as i define it is dependent on how much experience you have um because the amount of initial expense you have does not actually change the conversion ratio if you if you measure it via the right task so you might need so if if you have a fixed uh set of tasks then yes it does affect it but if you if you're able to renew your sort of task and and come up with stars that are orthogonal to the experiment that you have then it's not it's not going to actually affect the definition so but yeah you're you're definitely right that if you take a pure black box approach and all you're looking at like the only thing you can really measure is the behavior of a system and unless you know how that behavior is achieved you can't really tell immediately how much intelligence was was involved in producing this video if you look at an insect they're capable of super complex behavior are they are they crazy intelligent when actually you know probably not and the way you can really tell is by putting these systems out of their conform zone getting them to face novel situations and see how they adapt and that's the measure of intelligence it's adaptability the ability to to deal with novel and unknown situations but in order to give your system a novel and unknown situation you need to have this white box understanding of what wet it already knows the box yeah and that that's that's not really something you can you can work on so can i ask about the the generalization difficulty because i sort of had some difficulty intuitively with some of let's say it's limiting cases so for example you know the algorithmic complexity is highest let's just suppose we're dealing with problems tasks where we have whatever sets of integers mapped to 0 1 values you know the the algorithmic complexity will be greatest when that's just a random mapping like i just assigned 0 and 1 randomly to every single integer and if i go to look at that generalization difficulty it's going to be super high because the the length of the program for any set is basically going to need to be you'd have to encode the entire set as a hash table right so how does like this measure account for or help us avoid problems where we're confusing generalization difficulty with just increasing random you know randomness well i mean increasing randomness is a part of generalization difficulty right generalization is really the ability to deal with the stuff you don't know about the stuff you don't expect the stuff you haven't seen before and randomness is is a part of it but you're right that if you just add randomness to a system you're increasing the generalization difficulty but you're not increasing it in a very interesting way right because uh you are increasing it in a way that's gonna orthogonal to an integration system's ability to deal with it right the best you can do is modify the system to be more robust to to very much randomness because that's not super interesting what's really interesting to is to test the system's sensitivity to subtle analogies it's to make the system face novel and unexpected situations that are actually derived from the past but in interesting ways right not just random ways you've run this kaggle challenge on on arc and you know we we know from systems such as alphago and so on that bootstrapping intelligent like bootstrapping ai systems can be very valuable like playing them against each other and so on and um also we know that something like markets can be very efficient and valuable and i imagine a system where you'd have agents creating arc tasks and other agents solving arc tasks and they're going some kind of money around and so on and this could be kind of a powerful engine for research teams to research anything like this and you know given that you have i don't know how much but you do have the backing of of google with a bit of capital in hand do you could you imagine could you imagine there you know being uh a push for this kind of thing or or is it as of now an intellectual curiosity yeah um yeah so i i i don't have you know that much backing you from google around this this kind of project um but yeah so it would be super interesting to have this kind of two-part system where one part is generating the task and one part is learning to to solve them and you could get them to do some kind of curriculum optimization like the task generator network would not just generating you know no it would not just be trying to generate tasks that look like our tasks it will be trying to generate tasks that correspond to level of generalization difficulty and complexity that is right below the limits of the student system that's trying to sort them kind of like you know the way a teacher would provide exercises that are solvable but challenging they shouldn't be they shouldn't be easy they shouldn't be impossible they should be soluble children because that's how you get the most growth so it's actually a system that's described at the very end of the paper on the measure of intelligence and i think one thing i point out in the paper is kind of like the the pitfall you should avoid falling into is that this system is circular all right and the complexity you're going to see in your task it needs to come from somewhere right uh it's like on the conservation of complexity so the the system this two-part system needs to have a source of intrinsic complexity needs to be grounded in the real world and one way we can achieve that grounding and i've been thinking about it is i think we should you know like our tasks are as they are today they are made by me and this is not a good setup because it's going to be biased it's going to be very battlenecked uh as well i think we should start crowdsourcing our task there should definitely be you know a filtering system so that we make sure that we're only keeping our paths that are interesting that are not too easy that are not difficult and that are only grounded in in core knowledge requires but if we have like this string of novel arctis that contain intrinsic complexity and novel information because they come from the real world they come from human brains um that have experienced the real world and you use that as a way to ground your task generator then you you're starting to get a very interesting three-part system right so i would i would love to to actually get that started to actually uh uh produce uh a v2 of arc as soon as possible that would include you know 10x4 tasks and that that would be a crowdsourced and maybe something that would take the form of a continuous challenge where you where you have an api where you can draw a new arc task and every time you draw a task it's actually a different one because you have so many of them uh game of fire that'll make a fun game yeah on a mobile app there are actually a few people who've created because arc is is open source and totally free license there are a few people who have created mobile apps where users solve our tasks and apparently it's popular so there's also the other angle you mentioned in the paper which was which is pretty fascinating you're talking about it almost right now which is that okay let's let's start thinking about how to map arc performance to psychometric you know classic kind of psychometric test are there any efforts that you're aware of underway right now to do that are you involved in okay so there are two groups any etas yeah etas i'm not sure so we did we did a workshop at triple ai the other day and there were two presentations about efforts that teams of people so there are people who do new psychology and they are using arc in very interesting ways so there's a group at nyu and there's a group at mit and there and there yeah so they are using arc for new psychology experiments and it's it's super cool amazing um i want to switch over a little bit because of course you know other than the measuring of intelligence you are also famous for a small library you wrote once in a while called keras and i i wish i wish i wrote it and and then that was that no yeah it's been very much an ongoing project for the past six years it was because i remember you know the days of tensorflow 1 and and theano and things like this and and keras was just i think so helpful to a lot of people because it just easified all of this you know graph construction what not and and so on it just made it accessible to so many people and now with the development of you know things like pi torch and tensorflow 2 it almost seems like keras is it has been kind of absorbed by tensorflow 2 right there is tf.keras and now i think the newest apis are even sort of vanishing that a little bit do you do you see uh keras going away do you see it changing where do you see it where do you see keras going yeah so going away definitely not i mean we have we have more users than than ever before and we're still growing very nicely both inside google like more and more teams at google are moving away from transfer one and adopting keras and and outside google as well it's it's it's a big market out there and there's definitely room for multiple frameworks um evolving absolutely i mean keras is constantly evolving but evolving with continuity like if you look at cameras from 2016 or 2015 you look at class now you recognize is it the same thing and it's the same api and yet it's actually a very different and much much bigger set of features and things you can do with it uh so evolving definitely and um there are so several so you i think you asked you know about yeah like uh keras is getting kind of merged into tensorflow does it mean it's it's like fading away so definitely not so merging with tensorflow was a good idea because it starts enabling a spectrum of workflows from the very high level like psychic language to the very low level number like and everything uh in between in the early days because keras had to interact with multiple backends if you had a backend interface it means you had this kind of like a barrier where as long as you use the krc as everything was super simple it was psychically enlarged so very easy very proactive very fast but if you wanted more customization at some point you would hit that back end barrier and and you add reverse to tensorflow base or theano based workflow that was low level but when where you couldn't really leverage cars effectively by removing the backend thing and just saying the flow together in one spectrum then you get really this progressive disclosure of complexity when you can start out with a very high level thing but then if you need to customize your training step if you have an api for that and you you can just mix and match seamlessly the low level tensorflow stuff with the high level character and that way you can achieve any you can work with chaos and tesla at the level of abstraction that you want a very very easy high level of very very low level full flexibility it's history up to you i'm gonna point out the temptation here to analogize connecting type one with type two uh reason yeah why not wait i was just about to do that at least the francois has great form for this because not only does he talk about having powerful and useful interfaces and abstractions in deep learning he's he's been playing this game in you know in in the library world for quite some time but um i wanted to touch on this quickly we had a couple of people in our community um asking you about keras actually and uh robert lange and and i ivan finno said that apparently thiano has returned with jax and xla underneath and he wants to know are there any plans to edit there's a keras back end and robert lange also says you know just jacks on its own would you add that as a back end we've also had a couple of questions about pytorch as well is there anything on the roadmap for that okay so let's talk about jacks i think jax is an awesome project and the developers have really done a very very interesting and very good job with it and lots of people i like jax actually so that said adoption is not super high i think google is probably the company where it's the most adapted where you will find the most users and even then it's like a tiny tiny tiny fraction of total average any message at google but i think as a project it's it's a beautiful project it's elegant it's powerful it's great so would that would i like to add jack's back end to carrots or bite watch back into us so i want to say we've really moved away from this lag interface back end kind of model so precisely for the for the reason i was describing because you want to achieve this spectrum of workflows without having this cliff where you go you you fall from the high level down to the low level we don't want clips we don't because cliffs creates silos of users where you have the high level users you want a gradient yeah you want the gradient spectrum exactly so that said i think it would be super cool to have a sort of like re-implementation of the keras api on top of jacks that would also achieve this grading and that will still follow the keras api spec it would still be the same thing but on top of jacks that said so i would i would love to see something like this this is also very low priority for us because we have the the actual current keras which which we need to work on which has lots of users so we don't really have time to do this but in theory would it be cool yeah sure i would i would love to see something like this so if i had tons of free time i would i would probably build it but in practice fantastic we got another question from giovanni actually he says what does francois think of dr kenneth stanley's book on the myth of the objective are you familiar with kenneth stanley's work about the tyranny of objectives and open-endedness so i'm vaguely familiar with the name i'm not really familiar with the book oh okay well sorry no not to worry but it it's um ken has been a huge inspiration for me and and he he talks a lot about um objectives leading to deception so sometimes following um an objective monotonically sends you in the wrong direction and his solution to that is either quality diversity or more recently open-endedness which is that if you have an infinitude of objectives in in a sense the system has no objective and you can you know also with diversity preservation you can overcome deceptive search spaces but yeah you might have heard of the poet algorithm which he which he was involved in yeah absolutely no i'm aware and so when it comes to your your description of the problem with objectives i i completely agree with that one one thing i mentioned in the paper it's like the the shortcut rule which is that if you try to achieve one thing one objective you're gonna you're gonna achieve it but the thing is you're gonna take every shortcut along the way for things that were not actually incorporated in your objective and this leads to systems that are not actually doing what you wanted them to do like for instance we built chess playing systems because we hoped that a system that could play chess would have to be able to feature reasoning uh book learning creativity and so on it turns out it just plays chess that's what it does the same is true with challenges on kaggle but the winning systems they just optimize for the leaderboard ranking and they achieve it but they achieve it at the expense of everything else that you might care about the system like is the code base readable no is it computationally efficient no it's it's actually terrible you could never put it in production is it explainable no and so on yeah so it's like if you if you optimize for something you get it but you take shortcuts yeah it's exactly and that's very much what kenneth says as well i love what you said about shortcuts you said in your newest presentation that if you optimize for a specific metric then you'll take shortcuts on every other dimension not captured by your metric and you said in a machine learning context that's similar to overfitting right because on task specific skills you actually lose generalization if you get good at a particular task so it's completely orthogonal to what you want i know you're very well known for your skepticism of the intelligence explosion and what what i love about your conception of intelligence is that you think of it as as a system or as a process you say that intelligence is embodied right so you have a brain in a body acting in an environment and um in in in that context it makes sense that you would think that there are environmental kind of rate limiting steps to any kind of super intelligence right but um i spoke to someone the other day who is of the other persuasion shall we say and this person was um saying well what if you had a super super smart bunch of scientists i know you said in in your rebuttal that if you look at the iq of a scientist who is leslie richard feynman for example it's the same iq as a mediocre scientist turns out that iq only helps up to about 125 and then it stops helping you but these people would say oh well you know what if what if every single scientist was an einstein and intelligence is just making better decisions they would consistently make better decisions and science would accelerate a chimp doesn't understand how good a human is so how would we understand what a super intelligent person would do you know they'd invent nanotech they'd upload themselves into the matrix they'd do all of this stuff and somehow they would miraculously overcome do you know what i mean how would you respond to that yeah if every scientist was super intelligent in human terms that would in fact accelerate science but it would not really like accelerate science in a in a linear fashion and very much not in an exponential fashion so i i guess the main conceptual differences i have with these folks is that they tend to credit everything humans can do to the human brain and and they have this vision of intelligence as you know a brand in a jar kind of thing and if you tweak the brain and gets more intelligent and intelligence is directly expressed as power uh if you're more intelligent if you have a hierarchy you can do more things you can solve more problems and so on and in particular you can build a better brain and and by the way there is not really any practical evidence that's true but i view intelligence here more as this holistic thing that okay you have the brain but actually the brain is in a body which gives it access to a certain set of actions it can do instead of set up a perception primitives and this body is an environment which gives it access to a set of experiences instead of problems it can solve and to a very large extent you know the brain is just it's not so much a problem-solving algorithm like a problem-solving descending as it is a big spawn and you put it in environment it will absorb experiences from that environment and um one thing that's super important to understand if you're an issue if you really think deeply about intelligence is that most of our expressed intelligence does not come from here it is externalized intelligence so externalized intelligence can be can be many thing um if i look up uh something online that's externalized intelligence google is part of my brain if i write a python script to test some idea that's externalizing it my laptop is part of my cognition and so on but it's actually it goes much further than that most of our cognition is crystallized the crystallized output of someone someone else's thinking and the process through which we get access to all these accumulated outputs of people's thinking is civilization right and like 99 of the things you think are the behaviors you you hacked babies you execute you did not invent them you did not solve the underlying problem yourself you're just copying a solution you've seen like where in the middle of the pandemic you're probably washing your hands after you went outside and that's very smart behavior but did you invent it did you come up with that no actually other people came up with that you did not also come up with the infrastructure that enables you to do it in the first place and so and this is true you know for even the the most intimate of your thoughts you're thinking with words that you did not invent you're thinking with concepts uh that you did not invent or that you did not derive from your own experience they they really come from other people from this accumulation of past generations and if you want to enhance the expressed intelligence of people then this is actually the system you need to to tweak and improve not the human brain but civilization right in a way that seems like a contradiction because you're talking about the externalization of knowledge not intelligence so by your own definition doesn't that isn't that the opposite of intelligence that that's a great point so i'm i'm relating express intelligence so i was specifically saying express intelligence as opposed to fluid intelligence and what expressed intelligence means in this context is something very different from what we we talk about in the measure of intelligence it means intelligent behavior right and in particular i think the the ability to solve problems that you encounter as an individual typically when you solve a problem as an individual you're actually using a solution you've found somewhere else it is there are not that many problems that as an individual you solve from scratch in your own lifetime but here's the thing is that if you're able to actually solve something novel yourself you have the ability to write about it you have the ability to communicate it and then the next generation can can benefit from it so let me dispose a kind of a counter argument to this though suppose know suppose you're reading a novel about uh i don't know kind of planet of of the apes or something which was which is a planet that had a life form similar to ours but with a significantly lower lower iq right and and you know a human being shows up there one day and and these things start writing about this hey this weird you know alien just showed up here and and we captured it you know we ran some tests on it and we figured out it's really intelligent you know it's much more intelligent than than any of us are and and we're worried what's going to happen when you know 100 of them show up instead of just this initial explorer and some other other of these guys were like ah don't worry about it you know they've they've got two legs and two arms like us and most of what they are is kind of outside of their brain so you know i'm not i'm not really worried about it we would be reading that with trepidation right because we know that when this more intelligent species with more fluid intelligence more externalized intelligence better technology all this kind of stuff shows up those guys are going to get wiped out and it's actually happened like many times throughout human history not that humans were more fluid intelligent showed up and killed off you know other people but humans that had more externalized intelligence or more you know represented intelligence and technology certainly showed up and dominated absolutely you you're seeing it yourself that you it has happened in history it was not fundamentally about one people having smarter brains but one people having higher technology but that is not something that is attributable to to intelligence itself right there's a connection there if you did have a group of species or whatever that was much more intelligent they will have advanced technologically much faster and further in any given amount of time all else being equal right it depends on many factors and that's kind of my point is that is your brain a factor yes absolutely it is but there are other factors like we are just talking about the development of technology so in that case the the critical factor was not the brain but this superstructure in particular communication and environmental constraints around it the direction in which a civilization develops is a direct function of the specific challenges it it encounters that come from its environment it comes from this it's surrounding enemies and so on and techno technology core development advances the fastest when you have a civilization that are dealing with very harsh challenges but that are not quite large enough to work them out um because that's what forces them to to develop as fast as they can survival based so this is actually a very good example where the critical factor was the super structure that guided the ribbon civilization was not actually the brain but of course yeah if if one is smaller than civilization we will advance faster but the my point is that there are many factors and that by tweaking one factor the brain if the brain stops beating the bottleneck then immediately some other factor will be the bottleneck there are uh civilization civilizations that have not actually advanced very much at all because they simply did not face any challenges and did they have worse brains no actually they had exactly the same brain uh but somehow the the outcome was different because something else uh then the brain turned out to be demonic like a lack of environmental change for instance right i'm fascinated by scale and bottlenecks and systems actually except i work in a large corporation and when you have role fragmentation and lots of different businesses and lots of different organizational structures some people might decide to structure themselves based on data domain or based on organization or based on something else and you can think of it topologically and i think human society is very similar to this and i'm not sure whether you know evolution would lead itself to one particular topology but the environmental structures and the ways that we organize ourselves can create incredible bottlenecks and that seems to be where the real interesting stuff goes on rather than the individuals and i think you would agree with that francois yeah absolutely if if you take two companies and in one company the average iq is like 15 points higher but it has a terrible organizational structure and table incentives and the promo process is super broken or something and that company is actually going to perform worse than the more progressive innovation encouraging company and it has a very nice organizational structure and where people are actually more mediocre and maybe maybe they have on average 15 points less in in iq but they're actually going to do a better job because they have the better superstructure yeah it's fascinating that the problem is in in most corporations you can't actually design the information architecture to be more efficient because everything is so decentralized and fractionated you can only do it in pockets and if you try and fix something in one part of the organization everyone else will say well my requirements are different i'm not going to wait for you i'm going to do it my own way and it's actually a really really difficult thing to do well to sum up the the whole like intelligence explosion thing the the point is really that it's a system you have to look at holistically to get it key and just by tweaking one factor which is the intelligence of an individual human brain then what this means is this factor starts being the bottleneck but that means some other factor in the system because there are there's an infinitive factor that work will become the bond and by just focusing on one factor you're not going to actually lift all the boots yeah i and i i actually agree with you however you know i do want to say i think we just don't know i think both sides of the intelligence quote unquote explosion really can't say for certain that it will or will not pose you know a mortal threat to humanity like i think we have to accept that it's at least a risk factor right and we have to be very careful about you know in the future when we start embodying you know if we find general intelligence we need to be cautious if we come up with something that looks like general intelligence there is absolutely some risk potential around it however you know i've never seen anything coming anywhere close to that in fact the the systems that we have today they feature almost no intelligence whatsoever so i think it's it's it's a bit early to stop anything and even if we get into that conversation i think francois would say that intelligence must be specialized right because of the no free lunch theorem if if you define intelligence as your ability to solve problems then yeah it's going to be specific to a scope of parts a kind of problems and like yeah what what the new french theorem is saying is basically if you want to learn something from data you have to make assumptions about it uh which is why you know a covenant for instance is a great fit for image data it's not really a great fit for natural language processing and because it makes different assumptions about the structure it doesn't give me a lot of comfort though because i'm i'm fairly certain that whatever the first agi that gets created it's going to be highly specialized for killing other people because it's going to be a military you know secret project probably that finds it you know it's um i i don't know but what i know is that right now we don't have anything coming close to agi it's probably going to be actually a system that just displays you ads like if like if you know if we see where the most money is right now the the first agi is probably just going to like write not only display but write the perfect ad for you on the fly yeah like you should it knows what you ate and you know i know you're joking but i actually think an animosity or something that's highly unlucky because of the shortcode of the story because of the shortcut rule i don't think and a general intelligence is going to be created by military it's not going to be created by a system that's trying to show you ads because these are specific goals and so if you try to optimize those specific goals you're going to end up with a very specialized system in order to build a general intelligence you need to be optimizing for generality itself so it's going to come from if it if it comes from the applied either it's going to come from the academic side where you have researchers who are actually optimizing for generality itself security as angle or if it's come from the apply side it's going to come from people who have problems where they have to deal with extreme novelty and certainty and practice unpredictability so it's not going to be ads it's not going to be the meteor um i don't know what it's going to be one of the things that interested me about kenneth stanley was that he you know he says the reason we can't monotonically optimize on objectives is because of deception which means sometimes you need to get a lot worse before you get better his original conception was quality diversity which basically means if if you optimize for novelty that's something that you can optimize on monotonically and also if you look at evolution where there is a cacophony of problems and solutions you know divergently being generated then as an information accumulator you can optimize on that monotonically and your conception of of intelligence is generality and that also appears to be a monotonic increase throughout advancing levels of of intelligence so i think that's quite interesting anyway um france virtually this has been my dream come true to have you on the show thank you so much it really means a lot to us and yeah i appreciate it thank you yeah thanks for having me on the podcast it's it's really my pleasure this was this was super fun thanks a lot and thank you for caris by the way thank you i'm glad it's useful we're going to jump straight into the post show analysis okay well i'm going to mention you you did really well tim that that trickle sweat that this was running down your face the whole time yeah not very noticeable so i think you can you can relax you did that was fun that was really i think it went pretty well yeah yeah it was it was a dream come true i was actually i was very pleasantly kind of interested in how he he framed you know the measure of intelligence paper like look it's not really about the measure per se it's just that this is this is a cognitive framework a cognitive tool for thinking about where to go and a guidepost for building more generalizable or more general intelligences say like that i totally totally agree to and it's quite you know quite a fascinating goal which is like here's a framework to help us think more in the direction we need to be thinking yeah and it's so surprising that like the arc challenge is at like 20 solved only because you know he self admits that it's it's flawed right because he like he makes the tasks and you know there's only finitely many and and you know you kind of you see the kind of tasks he makes you know on in the public set you would think that not someone would come up with an intelligent thing but someone would come up with like a smart set of shortcuts to like solve that sucker right but it's still at 20 i don't know whether that's due to just you know not too many people investigating it um or whether it's really actually a hard problem and if it is our problem you know well it's fascinating too because if he if he achieves what he wanted which was getting it more outsourced right like getting all the intelligent people all around the world contributing to ark problems and refining them over time i think actually that community project would help the core knowledge people in that line of research and figuring out okay what what is a catalog of all the core knowledge right it's again back in school we used to call these prime thoughts because we would we would play these brain teasers all the time and we realized that there were patterns right like well this brain teaser requires the concept of coloring like with a red black tree where you add an additional variable that kind of lets you solve the problem and if we could really have a nice catalog of here's all the core knowledge here's all the like problem solving techniques i think that would be really powerful i mean well we kind of had that so this woman elizabeth spell key she came up with about six core knowledge systems right and and that and the arc challenge uses four of them so objectness and intuitive physics one agentness to elementary geometry and topology three numbers counting quantitative comparisons so the two that weren't in there are places and social partners now the thing is i think we may discover new ones well yeah maybe we will but i'm surprised that we did as well as 20 because if you think about it imagine if you just guessed the classification on imagenet when you've got a thousand classes 20 would be amazing wouldn't it and we we've got a similar amount of diversity of tasks on arc right and what's interesting as well is that all of those different tasks that have been created by francois they all tie back to just four priors right which means i don't know whether it's uniformly distributed but 20 seems really good for just guessing ops on a dsl yeah there's there's two things so first i would have thought that if someone if someone came up with something that solves more than five percent it's going to be like immediately at 95 like just because they've they've sort of cracked the problem and and then you know there might be a few outliers but you know if i i would guess that's kind of a task that if you hit the correct solution it's going to be like boom you're you're there and that's not which is surprising and the other thing is i i don't i don't feel it's surprising that there's so few priors what i do think is that the space of these priors is still way too large like so if if you just think about something like object because in in these arc tasks there are i feel so many more priors than uh just the core knowledge things because so one of them is like you have to you have like this thing and then you have this thing and the solution is like it goes boop right it go it like bounces but this is like yeah but but like the fact that we recognize like this is a wall or something but there is no there there's no no prior to says like a wall needs to be straight the wall could be like any you know any old any shape at all and the fact that this is much more core knowledge right like in you know we we build stuff out of straight walls and i think i think i agree with you which is i think i think what you're getting at correct me if i'm wrong but it's that the way in which the core knowledge is kind of specified right now is vague right there's a vagueness to it and i think if we actually start to try and codify that more in some type of a mathematical language tim i think it's going to expand like the scope of that we're going to end up with more core knowledge concepts really than than just six we'll need to make them finer grained and i'm really excited you know to see that develop because this has been for me a long wonder right which is what are the in in a rigorously defined way what are these core concepts these core bits of knowledge that make human cognition so powerful yeah and there's also because yannick made the point about brittleness right even in topological space you still have brittleness but but the solution was to create powerful abstractions right but how would that work with the priors because if you think about it you can recombine many of the priors to come up with powerful abstractions and you might find that it doesn't actually filter down to to that many but the question is how many things are there remember when we spoke to waleed suburb he was talking about he's got them somewhere in a powerpoint deck he just wouldn't give them to us but you know part of part of why why i agree with yannick that their finer grain concepts are more important i think probably stems from a lot of the computer science um education that i had were when we were devising algorithms to do one thing or another you get these little hints that kind of like clever bits of core knowledge that was used to solve this problem like when you study quicksort and it's like you know what like i'm just gonna randomly choose an element well random selection is kind of a bit of core knowledge and then i'm just gonna partition by that and then repeat you know or things like i don't know how to balance this tree the way it is but if i color stuff like add in red black nodes i can now overlay a computation that you know so there's all these little bits that you know that's what's fascinating about computer programming is it is it really strikes at the heart of this cognition and this core knowledge and how to written you have to do it rigorously right you can't just vaguely go oh you know just kind of sort it and merge them you got to define like what that means and it's fascinating dynamic programming yeah i'm always i'm always a bit amazed by people who have just kind of sort of learned programming because it it's it's almost like a different world in that they'll they'll they'll do it's like oh okay i need to solve this problem can i can i copy paste this code here and it works like 20 of the time but not fully yeah but then on the other side of the coin to that um so when i was working in um you know quantitative trading right we had these these massive globally integrated automated trading systems and i mean some of the bizarre i don't want to call them hacks but some of the bizarre sort of piecewise linear equations slash hacks whatever that actually work in reality you know you sit there and you look at them and go when i first went in there it's fresh out of academia and i started seeing things like this is crap like i'm gonna figure out some continuous equation that you know fits this piecewise linear thing and it's going to do better nope like it didn't do better i couldn't find any continuous thing that did better it's like you know options pay off right is this this piecewise linear thing and and you're like oh that's that well there should be some continuous like thing in there like all these weird you know piecewise discreet like kind of hybrid things between continuous and discrete work and and and that's weird it was weird to me and still weird to me interesting but i've got to say so my main three take homes from chalet today i really love chalet so one intelligence is generalization i think that's super powerful two his idea that deep learning is really good for value-centric abstraction and because of the manifold hypothesis lots of natural data has some kind of manifold which you can interpolate on but lots of discrete problems do not have that right and my mind was thinking well does that mean that we can just use because it's because of sgd you can't even learn the manifold even if it did exist but he's saying that it doesn't exist for discrete problems the manifold might be there or it might only be there in parts so that was interesting and then the third thing that fascinated me about chile is is he talks about these systems and and bottlenecks in systems and we shouldn't be thinking about individual brains we should be thinking about the externalization of knowledge yeah and and the way he the way he described this um what what he thinks like a hybrid system should look like which is sort of you have a perception layer and then a a discrete search layer and then on top of that kind of another fuzzy layer that guides the search that can be deep learning again and i think we're we're like halfway there on the top with the top very much looks like alpha zero right which is kind of a discrete search that is guided by neural networks and the bottom layer we have too because that's just our you know regular neural networks i i think we we have a big trouble in how to connect the two in a in a single unified way such that we can learn them right because the best we can do right now is is right we can we can plug a pre-trained network onto alpha zero or something like this but we can't really we don't really have it figured out yet how to connect the all this stuff a good example of that is the the neural turing machines like how it's so hard to to optimize them right and i think and not only do we need these kind of three components that that nicely integrate and are optimizable we have to be able to modularize and componentize and connect multiple instances of those things together in some you know weird topological network to really achieve like kind of the capsule network kind of vision where each of the capsules is maybe one of these units and then they're a part of this it's like a fractal you know kind of these fractal layers of those pieces i don't know whether i was misunderstanding you before yanny but with the alpha zero thing my conception is that has been quite hard coded so you're you're searching through let's say a bunch of deep learning models and the way you search is quite opinionated what chalet is talking about is have a very basic dsl and in that topological space you you just search and and you start to modularize and you start to create functions and abstractions and you have from a software engineering point of view you start to build a library of functions that have been written in code that do certain things right and that's that's different isn't it to alpha zero well the the alpha zero is made yes specifically to to search over actions in in some kind of rl space yeah i mean what what he describes is certainly much more abstract in that you you search over applications of the dsl and the dsl itself is not is like a perceptive dsl that in in itself is described by these lower level neural networks but i mean in s i i just it that just came to my mind when he described the system i'm like oh the the top part looks very much like you know alpha zero because that's essentially neural network guided search is something we we already already do though yeah i i i think i'm not sure i think just the reality is even a bit more fuzzy but because what you do as a as a human there's also some part of hierarchical system to it in that you can you can do this but you can do it hierarchically right you can you you can be like okay i'm gonna right i have to solve you know i have this high layer search and then each of the search things goes through maybe a fuzzy thing but then you you again search at to solve the sub problem and um there is also you can do it at will too by the way like you can you can scan an image and you get this type one that sort of finds a bunch of objects and then you do this type two thinking where you start reasoning about those and in your mind you can kind of zoom in on one let me like zoom in on that tree and now like now i've got the bark the pieces of the bark as objects and bugs and reason about so you have this ability to transcend the process and tune in and move it around yeah this this self like the that's the whole consciousness aspect right that's even like apart from intelligence you have the ability to to introspect the whole thing and that probably is a big part of intelligence i mean i guess you could have intelligence without consciousness but you know there is an argument to be made that the fact that you can introspect your own processes contributes in big part to the furthering of intelligence yeah i would separate consciousness and intelligence but it the thing that hit me the most on his newest presentation was when he said intelligence is literally sensitivity to abstract analogies so we were talking about the kaleidoscope the main thing here with intelligence is that there is so much repetition in the universe right but it's repetition in this funny way where it's it's sort of fuzzy repetition like yeah sure the solar system kind of resembles galaxies kind of resembles you know but but then there are these little weird differences these asymmetries and you know like the universe is a fascinating place i mean and i don't know something yeah right that's not when you say you you have to make analogies which is i can i can absolutely see you know this and me i think my question was formulated a bit dumb where i said you know if the line is squiggly what i more meant is that you know in in that case it's not a line it's a squiggly line and the same with the social situations you know that is like okay that person over there kind of doesn't like me but then in the next social situation it's kind of a person that doesn't like you and has a gun or or something like this i i almost feel like or a group of people or a group of people sure they are similar in some way but it's never the exact same thing so this reasoning by analogy does work but you always do your little modifications on top specific to the situation and i'm sure there's a place in this framework for this but it's it's just again it's it's like a lot more complex than yeah i think that's what he called i think that's abstraction at least that you know that was prior to today my my concept of extraction was similar to that which it's removing the in insignificant details so you're able you're you're able to take whatever you know some you know object thing situation doesn't matter and kind of strip away all the stuff that doesn't matter for whatever your purpose is that's abstraction and you know i think one of the weird things is that and this is kind of the unreasonable effectiveness of mathematics right is that abstracting actually produces things that are useful you know that abstraction i think the fact that abstraction helps with generalization is a very not well understood kind of mystery in a sense like why should abstraction help generalize but it does like in the real world that's what happens though the yet abstraction in though abstraction has to be somehow specific to what what you want to do like like right an apple is an apple only if you know you're looking for food or non-food but when it comes and it's a sphere if you want to shoot it out of out of a potato can exactly but when it comes to you know separating fruit by ripeness then it's not when apple is an apple then all of a sudden this apple has much more in common with this orange right so even the way how you abstract it's not like it's not like we can just you know plug in our resnet 50 and then boom we get an embedding vector and that's our abstraction but how you abstract is also incredibly specific to what you want to do yeah and that's what yeah and i agree with saba that this is an empirical question right you know like he's kind of like these concepts or whatever it's an empirical question and chalet's i think the pr the art project if it ever becomes this crowd-sourced thing is going to give us lots of data to start thinking about this empirically and it's going to be really fascinating i mean this needs to be like this is a this is a prime blockchain project because you can you can probably like you can probably even zero you can zero knowledge prove that you can solve uh a given set of arc problems right you can probably create zero knowledge so you wouldn't even have to show your solution and if there you know people would put up arc problems and they you know if you want to try them you will have to put up some money and if you can solve it you know the the creator of the challenge gives you some money or something like this like this this could be fascinating maybe you could do you know homomorphic like uh arc right where like you don't even you somehow like you're saying you can just prove you can solve the problem without ever you having seen the problem but just an encryption of it yeah yeah normally homomorphic encryption comes after blockchain in the same sentence yeah what else can we get in there ten weeks so we got blockchain homomorphic encryption what else what can we throw in there bitcoin can't we just say people should have to pay through bitcoin if if they if somebody wins a challenge on hark we'll get our own token of it artcoin oh god hold on i gotta get that domain right now i want to know by the way so the whole point of the arc diversity of tasks for developer aware generalization which means the developer could not have conceived of the task but if all of the tasks are representing for human priors then how is that developer aware of generalization because the developer would be aware of all of those price of the priors right but not not of the the task right that's the the control is the contr like that's what he said you have to know the start of where your your white box analyzing from and this the start is not clean slate but the start here is these four priors so it's it's kind of the diff between you give the developer those four priors what can the developer come up with just from that right yeah because i think i think there's a lot of information leakage there and you implicitly said the same thing because you said once you solve it you know once you solve some of them you've solved all of them okay artcoin.com is available for the it's but it's it's a premium domain so it's 300 bucks should we get it because it has coin in it i guess we need to figure figure out something cooler like no art coin okay i don't care enough to grab it right anyway we should draw this to a closed latest adjustment but yeah thank you very much for listening we published yep thank you it's been it's been emotional we've recently reached 10k subscribers actually so uh yeah thank you very much we're still going to continue the show now that we've had chileo oh yeah i thought this was the end i thought we were going to cap it with sure i mean to be honest we might as well just stop now anyway see you thanks thanks bye i really hope you've enjoyed the episode today remember to like comment and subscribe we love reading your comments and we'll see you back next week
Info
Channel: Machine Learning Street Talk
Views: 38,710
Rating: 4.9564586 out of 5
Keywords:
Id: J0p_thJJnoo
Channel Id: undefined
Length: 121min 53sec (7313 seconds)
Published: Fri Apr 16 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.