DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi there i have a little challenge for you right here look at these numbers and see if you can figure out what comes where the question mark is now if you look at it a little bit you'll recognize that this is the sorting algorithm so you're supposed to sort these numbers in ascending order and that's going to be the solution and the why i'm showing you this isn't because it's particularly hard or because i'm particularly good at sorting numbers it is because this is a core feature of human intelligence that we haven't been able to reach with machine learning quite yet we are able to look at very few examples and then generalize to new examples and we do that not by the way machine learning does it by you know gradient descent into a model but we do it by coming up with a rule such as hey this is sorting even if we didn't know what sorting was we would be able to come up with the rule nevertheless because we would realize you know i need to compare the numbers and i need to pick the lowest one first and then the second lowest one second and so on so we humans are able to come up with rules to solve the problem and in more general sense we're able to come up with a program with an algorithm that solves the problem and that is the point of this paper to solve problems not with pure brute force machine learning like gradient descent from a data set but with coming up with rules with algorithms to solve the problem now this brings its inherent challenges it's not a new approach but this paper makes it more scalable than before so the paper is called dreamcoder growing generalizable interpretable knowledge with wake sleep bayesian program learning it's by kevin ellis catherine wong maxwell nye matthias abel meyer lucari luca morale luck hewitt armando solar lessema and joshua b tenbaum so again the program the the paper says itself we present dreamcoder a system that learns to solve problems by writing programs it builds expertise by creating programming languages for expressing domain concepts together with neural networks to guide the search for programs within these languages so the entire model is going to be a system that sees problems just few of them and comes up with programs that solve these problems and and it does so in its own language it builds up its own programming language and then it's able to synthesize programs in this language that solve the the problem and it does so by having a neural network guide that search so that's dream coder it includes this wake sleep algorithm which has been also around for a while but it's it's kind of a different take on it the wake sleep learning algorithm alternatively extends the language with new symbolic abstractions and trains the neural network on imagined and replayed problems so the past ventures into program synthesis have all been not really scalable because either they have some kind of handcrafted programming language that you search over or they have handcrafted rules of how you search and so on this system here is much more general and it can solve a vast variety of different tasks so for example here you can see the different types of tasks that the system can solve there is list processing oh sorry that's a bit heavy there's list processing such as summing lists doubling each element check for evens uh text editing uh learning reg access for stuff and also very creative things like creating graphics um creating block towers regressing symbolically recursive programming and figuring out physical laws we've already looked at paper that figure out physical laws from data but they have been sort of geared towards that and this is the same system that can figure out all of these things now of course it's going to be configured a little bit differently if you talk about list processing versus figuring out physical laws but it is the same underlying system and ultimately what is that what does that amount to that amounts to you giving the you giving the system a problem and let's say the problem right here is what do we have here to sort a list yeah that's what we came up with at the beginning so here you have the problem of sorting a list so you're going to give the program a few examples like three like i gave you at the beginning and the the system is going to come up with a program now the program ultimately is going to look like the thing down here it's going to come up with a program that implements the list sorting algorithm and it's going to do that with by a few principles so principle one of course it needs to fit all of the examples it needs to explain all of the examples otherwise it's not a correct program and program sorry concept two is it needs to be easy it needs to be very very uh explainable in the sense of it needs to be very short because there are many different rules that you know these these uh lists follow like i can i can come up with i can literally create this as a hash table i can implement this as a hash table for these three lists and that hash table would solve the problem exactly as well as the sorting algorithm now the sorting algorithm is much more compact it's simply it's well it's this thing down here and beyond that what the what the system does is it builds up a library of concepts so not only not only the system doesn't see the program at the bottom the system actually sees this program right here so this is the sorting algorithm in the systems language because the system has built up a learned library of concepts over time so as we train the system to solve different tasks on lists such as you know sum a few things double a few things and so on it builds up this library of concepts so there are these primitives right here that you give it and then it's able to come up with these concepts that we as programmers might call functions so it's able to come up with a thing that can filter a list it doesn't have it in its initial primitives but it's able to discover that because it uses it again and again and again and now it's able to use that function instead of the primitives so whereas before you know it would have to it would have used the entire code in this thing now it's just able to say well i want to use concept 4 right here and that makes the programs that are written much shorter so it uses this to implement the maximum function which it calls it concept 13. of course it has no concept of what we name function uh and then it's able to use concept 13 and concept 4 together to implement the nth largest element function right and once i have the nth largest element function i can simply iterate from the beginning right i have a list i simply iterate over its length so i iterate that and i always use the nth largest number and that will sort my list so you can see that the program that sorts the list is super short in terms of this library we've built up so this this is our challenge for building this system we somehow need a system that is able to come up with programs to solve problems that is able to build up a library and that is able to efficiently search through that self-built up library of concept and dreamcoder does all of this at the same time so dreamcoder has three different stages in which these things are tackled so imagine you have a data set of tasks so the tasks here are these x's okay so x are the tasks now the tasks can either be as i understand it of a single thing like list sorting right but they can also be the general class of list problems which makes more sense in our in our uh class so imagine we have a kind of the the general the general class of list sorry the the general class of list problems now it maintains as we said this library l and you can really imagine this as a programming library so it contains functions that the program can call and it also contains all the primitives that you give it so there are going to be a bunch of so this is going to be like a set they're going to be a bunch of primitives like a plus b a minus b a times b in that's in in terms of math right here we're in lists but and there's also going to be a section down here that the program can fill itself so the program can define a function that's like 2a plus b right and then it it's able to to call that so that's the library right here now what the what the system needs to do is it's given a task so the task here as you can see is a few examples of i don't even know what it does here do you know what it does it kind of reverses the list and adds one or subtracts one something like this yeah i think it reverses the list and then it adds one right that's the that's the task that we we handle right here right this uh you can see all of these things is reversing and adding i have i've actually not solved that before so it might be wrong so what we have to do is we have to come up with a program that solves these tasks right that if we give the left side as an input the right side appears and that is hard that is a hard problem because you know we start right here with an empty program and we build up a search tree now every single one of those rules here could be applied right so the program uh could be you know let's take let's take the or yeah let's let's say these are not math things but these are our list things so um i guess reversing is one of them map is another one but you you get the point so you have you put these rules here and you apply you could apply the first rule right you could build a program made up out of the first rule you could build a program made up of the second or the third now if you already have so here your program is a plus b if you have that you could then again apply the first rule which would give you a plus sorry a plus a plus b you could apply the second rule which would give you um a plus a minus b right i'm just substituting kind of the the second element right here this is obviously implemented in a functional programming language that makes all of this really well defined i'm just kind of uh showing it for in in easy mode right but you get the point like i can arbitrarily search through this tree and i can apply each of those rules over and over and over again and you can already see that this is going to give me a massive search tree uh like how am i gonna solve these problems in in these kind of massive trees and that's where the neural network comes in it's actually the only part in the system that is machine learned as far as i understand it or at least that is neural networked machine learning isn't only deep learning but this the search through a discrete space that is really large um is hard but you as a human are able to do it how how are you able to do it you have an intuition right you have some intuition that you know here uh the for example the lists appear to be the same length if you look at the problem so you know you look at that and you say well maybe there's something with the ordering maybe the first corresponds to the first or the first to the last or something like this so you have some kind of intuition of which rules you want to apply and this intuition whenever you say intuition in a program that's a prime place to put in a neural network so if you know alphago or alpha 0 that is exactly what it does right it is here at a particular chess board right and it could do all of these different moves but it cannot brute force search all of the game tree because that would be impossible it's computationally too expensive so what it does is it employs a neural network that tells it well this here looks promising you know off the bat and this one doesn't this one doesn't this one looks promising and so on and then you only go down those two and from there again you have many options but the neural network eliminates almost all of them and tells you which one looks which ones look promising so that enab if the neural network is a good guide that enables you to quickly build a program that might solve the problem okay so you do that you search you search neurally guided search you propose programs in decreasing order under your model so this here this is your guiding model this is a likelihood model like how likely is a program given the task that you're trying to solve you try the most likely one first and then you you go down so you search for the best program which in this case means the program that solves the task but is also the shortest right the the intuition is always that a very short program is going to be um is going to be the better program because it's a kind of a simpler explanation right so here the the fewer steps you make in your search that's a better program and the more the neural network likes the program that's a better program because the neural network is trained for this right so um the best pro and you come up with the best program for the task okay so you choose the program that maximizes uh the likelihood of the program given the task and the library which is proportional if you apply bayes rule to the likelihood of the the likelihood that the program generates the solution which this is just one or zero if you have a if you have a non-probabilistic program and then this here the likelihood of generating a program from your library is just going to be proportional to the number of steps the number of search steps that you need to make okay so that's the wake algorithm in the wake phase you try to solve the problem from the training set you sorry you try to solve the tasks by coming up with programs that solve them now that gives you a data set of solved programs right so initially you're going to have a data set of tasks you're going to run this through the wake phase and most of the time you're probably going to fail right most of the time it's like no can't solve it but some of the time you're going to succeed so you're going to have a little bit of a data set of where you've actually succeeded and this data set is now going to be the the input into the sleep phases so what do the sleep phases do and the sleep phases are crucial here because if you only if you only have the guided search that's already okay that's already good right but it's not going to help you to build more complex programs because those are still if you look at the program that is the list sorting program down here like this is so large you can never get here with search at least you know not in a reasonable time you need to construct these abstract concepts because this program here is much shorter this short program is much shorter than the long program and you can only get there by building these these useful concepts by building up the library so in the sleep phase we're going to build first of all build up the library which means we're going to take this data set that we've constructed like here are all the things that we could solve now we're going to take that and what we're going to do is we're going to look at our solutions and we're going to compress them grow library to compress programs found during waking okay so here we have a bunch of primitives this is all the stuff we can do now we're going to see which of the things that we use often in combination with each other so if we did very often did like apply the first rule twice right so if we applied a plus b and then we applied a plus b again which would amount to a plus a plus b which is 2a plus b we can say since i use these two rules in conjunction very often i'm going to um make a new rule in my library that allows me to simply apply this with just one step instead of two so i'm going to add 2 a plus b to my library because now since i already know i i need those two often together i this is simply going to be just a single rule in reinforcement learning this is sometimes called an option so it's kind of a higher order action that you can take and it is you know it's it's there there's a lot of work trying to uh get these options so what they do right here is sort of the same it's a compression step so they're trying to compress the programs that you found during the wake phase so here you can see an example of this you have a program for task one a program for task two these don't necessarily even need to be the same tag like they don't need to be the same they don't need to come from the same task description right they but it's just kind of from the same data set and you notice that you've used this subroutine right here the orange subroutine in both programs what they do is they extract this subroutine into the library okay so and they have special algorithms for this this is not an easy thing so they have a very efficient way to search through these program trees recognize commonalities and extract those they don't describe that in the paper but it is it is not a trivially trivial thing to do this however imagine that you can just do this and then you expand your library so mathematically you expand the library with the routine that maximizes the following so you essentially won't want to do two things this here is simply the the p of the library itself is simply how large the library is so you wanna you wanna keep your library small right if you could just add things at will your search problem would again become too large because you have all these rules you could apply so you only want to keep the best rules but then also you want to maximize this right here over refactorings of the programs that you found so you want to keep programs again this first term simply means the programs actually solve the tasks that you have um so there if it's probabilistic it's different but we will just say the programs need to solve the tasks that you've encountered and also the programs need to be reasonably short given your library right and the given your library you've already seen this before in the wake algorithm right here this is the same term and the important thing is that is given your library right a program the sorting program up top isn't short it's like it's freaking long but the the program the same program given the library is really short because i can use this concept 15 from the library and the concept 15 in itself can again use the concept 13 and the concept 4. so the gray box right here will be kind of the the size of your library right because this is all the concept and then the orange box on the right would be the length of the program itself given the library these two things combined need to be small which makes sense so you extend your library by the rules that are themselves small in terms of the library that are used often that solve a lot of problems and that don't grow your library too much so now that you've come up with new new rules um you're going to the third phase and they call this dreaming so dreaming this this would already be i think this would already be enough and they do ablations where they leave out different parts right here but a thing you can do if you have this um essentially you have a dsl for your problems right and what you can do if you have a dsl is you can just apply you can just build programs at random right you can just take a bunch of rules and apply them and if you do that you it de facto generate new um new problems to solve so if usually during the wake phase you have an input x and you have an output y and you ask yourself which program solves this right and these come from the data set but this right here is built from a grammar right there's a grammar which is your library so your library builds those programs now what i can do is i can simply um i can simply instead of doing the search tree thing i can just apply a bunch of those rules i can just simply start here and apply rule one then apply rule two apply rule five and so on and that's going to give me a program i can apply that program to some input data that comes also from my training set it's going to give me some different output data because it's a different program but this now gives me another training data point it's not from the real program but i don't care right i can train my neural network to i can train my neural network now it's again let's find this program i can train my neural network to get better at finding uh programs because i know the program in this case right the difference between in the wake phase i don't know what my program is right in the dream phase i construct the program so i know what the neural network should suggest as my steps right here it should suggest of all the options it should should suggest the first one here it should suggest the third one and so on so i can do supervised learning of my neural network to to learn to search better in the space of programs by coming up with my own programs and therefore generating my own training data that's exactly what the streaming phase does so in the dreaming phase um actually we're going to take two things so we're going to train this neural network which they call the recognition model and you can see this is this is the thing that guides your search to predict the best programs for typical tasks and the current library and typical tasks means either tasks that we sample or tasked with the input from the training set but you know we come up with the output ourselves so this what i've just described they call fantasies draw programs from the library so construct the program set task x to the output of executing the program and then learn learn given x uh i want the program p train the neural network to come up with the program p since i know what the program was or alternatively i can again use these tasks that i solved correctly right here and i can use those as a training data set since i already i know that i i just like i don't necessarily know that the program is the correct one i just know that the program i came up with is uh able to solve the examples that i had but it's good enough right it's good enough uh to act as a data set as well and we do that to keep ourselves grounded in reality we can't just start you know start dreaming up fantasies because the fantasies it's sort of a cycle and like this is a cycle we come up with a library of like a language to describe the problems and then we use the language to generate new problems and then we use those generated problems to train our neural network if we were to only do that the danger is that we kind of drift away from reality and that our neural network learns very well to search through our imagined things but you know as soon as something real comes along it's so different from what we imagined it's it's no longer viable that's why we also use the replays and i think they use a 50 50 mix of fantasies and replays the reason why they even use fantasies is to be more data efficient so you could do all of these things without the fantasy dreaming stage by simply training the neural network on successful replays but that would be much more data inefficient so yeah it's sort of a house of cards that you build up and i feel it depends a lot on many things right here like it depends a lot on the primitives that you give beforehand it depends a lot on the tasks you choose and how well they are suited it depends on the yeah on the language itself like how you can apply the rules of course the paper is trying to tell us that the same basic algorithm can solve a lot of these tasks but i i still think uh the tasks are very suited to what the network does and the network is or the system is built a lot with tasks like that in mind and that leads to the that leads to this opportunity that you can even do this streaming because it you can only do this dreaming thing if you know if constructing problems out of your library right here out of your library l is is useful for training your recognition model if that were not useful this algorithm would probably work much worse but as it turns out for these problems it's useful so here you see another example of this abstraction step so we have um we have two tasks in the in the wake phase that the the system solved by the way there is a little bit of a mistake here um but you know we're we're humans we we can successfully work our way around uh this problem which yeah so there are you know these these um the wake phase has actually solved both by coming up with programs and now the the sleep the abstraction phase is able to search through a giant number of refactorings in order to come up with this primitive the map primitive right and they stress again so their algorithm that they have for this compression which they don't explain necessarily in this paper but is is able to wade through a giant number of possible refactorings uh to come up with these common sub algorithms it's not as easy as simply looking at comparing trees it's actually much harder because you can refactor programs in many different ways as especially if you have a sufficiently general programming language like this one right here so ultimately it would extract this map primitive and then you can see that both programs immediately become a lot shorter like the the top program sorry the left one is this and the right one is this once you have the primitive they become super duper easy so in terms of experiments what they do is they um they apply this as we said to these kind of list tasks but also to these drawing tasks and here the primitives as much uh plus and minus and so on or these languages that you've seen the primitives are much more like you have a pen and you know it is at a point and you're able to kind of move the pen in very basic forms i i imagine so it's sort of a description descriptive language of a vector graphic and you can see right here so this is these logographic tasks the model writes programs controlling a pen that draws the target picture so that that's just these are the tasks the task is simply get me a program that draws these pictures okay those are the tasks you can see they are fairly diverse so there is a lot of things that you somehow have to have to get in order to be able to draw this and when they analyze what the algorithm comes up with during training of on these tasks is that it discovers these primitives so the primitives if they analyze the library after training contains things like the semicircle function so the algorithm comes up with a function that takes a value r and draws a semicircle with the given radius you can see that depending on the value of r the semicircle is larger right it all it comes up with primitives like i can draw a greek spiral i can draw an s curve and so on uh it also comes up with so what you see in c right here so each row um sorry each run b shows the same code executed with different parameters each image in c shows the same code executed with different parameters and a different sub program so it is able to to come up with higher order functions that so functions that take another function as an input in this case the the radial symmetry function that takes in a number n and a lower order function and it will replicate that lower order function in in kind of a circle manner so this it com it comes up with these things by itself now again this is pretty cool by the way and at the bottom you can see what the dreaming phase comes up with so at the beginning you can see that the programs that the dreaming phase comes up with are fairly simple right and as the library grows so grows the complexity of the programs it's able to come up with so this is sort of a built-in curriculum that the model has it starts but you know by constructing problems from its own library given that at the beginning the library is pretty primitive uh it you know it um it doesn't do much but uh over time it does now here you can by the way i think the uh the pen starts at the dark and goes to the light uh like the color coding is is where the pen starts and end ends i'm not i'm not sure the exact direction they stated sounding yeah it's starts at blue and finishes at at pink okay and you and this is during super uh early like this doesn't need many iterations so illustrate the most interesting dreams found across five runs oh sorry no across five runs both before and after learning but the sort of the iterations that it takes aren't that many uh to find solutions to new programs but you can see i feel right this is just my opinion that if you look at the problems and if you look at the primitives that the thing comes up with you probably see like i see that the person or the system who came up with these tasks is constructed in much the same way as these sort of primitives like probably the person that came up with the tasks wrote a little dsl uh saying okay you know i'm gonna you know have a semicircle function and that's going to be parameterized and so on and um you know so so this these problems themselves are sort of generated by already by a dsl or by a human that has kind of this dsl in mind and applies it and therefore i i think that's what i said when i said it's probably the system is very geared towards these problems because what it's going to end up doing is going to end up kind of rediscovering how the data was generated and that makes me a bit so so the question now is does is this going to work on data that wasn't generated in this way or alternatively you can ask does the universe have a structure like this and there's good arguments like like it can discover physical laws so here it can also do by the way the same thing with these tower buildings and you can see the primitives it's discovering are things like build an arch build a wall build a pyramid like those are primitives and with arguments and the different arguments will give you different um structures right here this is very cool and these are the dreams down here what it comes up with so it's you know pretty intricate dreams the combination of those rules now again the question is does this work on let's say real world data and i feel that is you know is real world data does it behave similarly and you know maybe i don't know uh yeah so here you can see a bunch of ablations where they show that if you for example if you're missing the abstraction you won't get very far very often um for example in these in these logo graphics you see pretty clearly that without abstraction or without dreaming you won't you won't get very far especially i feel that abstraction hurts quite a bit because if you can't abstract you're only going to go so far in constructing programs so you can't construct large programs even if you have a very good neural network guiding your search and lastly they go about as i said discovering sort of physical laws and they they sort of rediscover physical laws from numerical um inputs and that's what i mean maybe the world is actually like this at least that's how we humans solve problems right we we search for a simple simple explanation to the things that we see and you know science has been very successful especially you know newton has described no newton's second law is like literally the spig so and it describes a whole lot of of interesting physics and you know similarly lots of other physical physical laws which uh it's kind of an unsolved mystery why everything's so simple but given that it is a program like this might very well be appropriate so a program search system might very well be appropriate you know that being said it probably can't out of the box solve computer vision or something like this and they admit that in the um in the in the last part here but just look at kind of the primitives it discovers itself so just from the initial primitives that you see right here like map zip uh i don't even know what that is like i'm not into functional programming uh but from the initial primitives it discovers the concept of subtracting vectors adding vectors uh dividing by two and so on [Music] from those it constructs things like the square root function which you know it's it's pretty remarkable and from those it in discovers things like the inverse square law and you can then see that for example newton's second law is only a combination of you know very few applications of library rules so it's an exceptionally short program given this library and also coulomb's law you can see it's just kind of two rules applied to the four inputs which if you expand this it's a fairly large program but because you have this library built up it's it's a short program and they do an one other experiment where they give it so they they do recursive programming algorithms um like list operations again but they only give it like the bare minimum that according to functional programming theory as far as i understand it you these are the real the primitives you need to solve the problems and specifically what it does is it first discovers the fold and unfold functions so fold is also called reduce i think if like that's a more common name first it discover these these and from these it builds all the other ones and they say if you go and you look at kind of functional programming theory that's exactly what they say is necessary so they say given fold and unfold you can sort of build all the other ones and these primitives and again you can see list difference function is very super duper short in terms of this if you have this library so if you've discovered the zip function and that expands to a program that is fairly long that you would never reach with even with neural guided program search okay and not only like reaching it is one point but then you also have to recognize that that is actually the correct one right and and you do that as a human by looking how short it is and this is not a short program like you could building this as a hash table is shorter than this program so you would rather take the hash table i guess if you just have two examples rather than the program but given that you have all this library the zip a minus b is actually much shorter than encoding it as a hash table all right so they say you know the real world data let's say that here much real world data is far messier a key challenge for program induction going forward is to handle more pervasive noise and uncertainty by learning more leaning more heavily on probabilistic and neural ai approaches recent research has explored program induction with various hybrid neural symbolic representations and integrating these approaches with the library learning and bootstrapping capacities of dreamcoder could especially be valuable going forward and i agree this so we if if it's not out yet we had uh francois chole on the machine learning street talk and if you if you know him he came up with this this arc challenge where you do like it's almost the same thing as dream coder does except with these kind of pictures and you assume that humans have this thing called core knowledge which they also allude to in this paper and core knowledge is things like an intuitive understanding of physics and object-ness and so on so one of the arc challenge things is like there's kind of a thing here and there's a thing here and then the solution the solution to that is um there's again the thing here and so that's the solution right and you can already see from one example that's kind of like a ball bouncing off the wall and you do that by applying your core knowledge so to say so this again is a very very clean data so the in in arc i think everything is super clean data and they say you know if we want to apply this to real world problems and this is also something that surely has said in in the podcast which i invite you to listen to as soon as it's out is that we're going to have to combine this search so the the dream coder it does kind of the search which the search over a dsl so and the dsl is learned right now what we need this is kind of these are different layers what deep learning usually does is this perception so deep learning is really good at doing perception so this is current deep learning and this up here is what dream coder does or generally program synthesis approaches do we need a way to connect the two so we need a way to learn these jointly because that's what you as a as a human some somehow do you're able to learn your perception model which is kind of a perceiving model and your your logic model your reasoning model at the same time or just jointly in some way and we haven't exactly figured out how to do that yet and i feel and i agree with this paper that is probably going to be a very valuable thing to do all right so let me know what you think about this paper i invite you to read it it is it is high level right uh but there are some other cool things in it like the dreamcoater learning regxas for different uh types of numbers and so on but yeah i i think it's an interesting field it's a bit different from just kind of core machine learning and that was it i'll see you next time bye-bye
Info
Channel: Yannic Kilcher
Views: 29,202
Rating: undefined out of 5
Keywords: deep learning, machine learning, arxiv, explained, neural networks, artificial intelligence, wake sleep algorithm, program synthesis, ai program synthesis, program synthesis deep learning, dreamcoder, dream coder, mit dream coder, bayesian program search, neural guided search, learning to sort a list, neural networks learn sorting, deep learning physical laws, deep learning symbolic reasoning, symbolic machine learning, symbolic artificial intelligence, deep learning tutorial
Id: qtu0aSTDE2I
Channel Id: undefined
Length: 48min 18sec (2898 seconds)
Published: Sun Apr 11 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.