e e e e e e e e e e e e all right hello everyone well I'm here to talk about piece that I recently wrote about biological evolution it's called why does biological evolution work a minimal model for biological evolution and other adapted processes and we're going to kind of go through piece that I wrote and then there hopefully will be time for some questions and answers at the end so this is uh the um this is the piece I wrote here and uh we're going to start off going through this so first we're going to talk about the model why does biological Evolution work and for that matter why does machine learning work both are examples of adaptive processes that surprise us with what they manag to achieve so what's the essence of what's going on I'm going to concentrate here on biological evolution but much of what I'll discuss is also relevant to machine learning but I'll plan to explore that in more detail elsewhere okay so what is an appropriate minimal model for biology my core idea here is to think of biological organisms as computational systems that develop by following simple underlying rules and these underlying rules in effect correspond to the genotype of the organism the result of running them is in effect its phenotype cellular tometer provide a convenient example of this kind of setup and so let me show you an example here of uh and in this involves cells with three possible colors so the rules are shown over on the left and the behavior they generate is shown on the right so we're starting from a single kind of orange cell here we see that from this seed a structure is growing which in this case dies out after 51 steps in a sense it's already remarkable that we can generate a structure that neither goes on forever nor dies out quickly but instead manages to live in this case for exactly 51 steps but let's say that we start from the trivial kind of null rule that makes any pattern die out immediately can we end up adaptively evolving to the rule that we see here imagine making a sequence of randomly chosen point mutations each changing just one outcome in the rule like this then suppose that at each step in a minimal analog of natural selection we accept any mutation that makes the lifetime longer though not infinite or at least the same as before and we reject any mutation that makes the lifetime shorter or infinite it turns out that with this procedure we can indeed uh adaptably evolve to the rule that we showed before uh and here we're just showing kind of the way points of progressively greater lifetimes so different sequences of random mutations give different uh random sequences of of rules but the remarkable fact is that in almost all cases it's possible to make progress and routinely reach rules that give Long Live patterns so let me show you examples of that so here's an example with a lifetime 100 that reaches eventually lifetimes of 107 and 723 steps um here's one that reaches lifetime of NOP yes of uh uh 723 steps always producing kind of elaborate morphological structure is it obvious that our simple process of adaptive Evolution will be able to successfully kind of wrangle things to achieve something like this living 700 and something steps no it's not obvious but the fact that it can do this seems to be at the heart of why biological evolution manages to work so looking at the sequences of pictures like the one we see here here we see that they often affect different mechanisms for producing long lifetimes that emerge in different sequences of rules typically we first see the mechanism in kind of a simple form like at the top here um and uh then as the Adaptive process continues the mechanism gets progressively more developed elaborated and built on not unlike what we often appear to see in the fossil record of biological evolution but let's drill down look in a little bit more detail at what's happening in the simple model we're using in the three color nearest neighbor that's K = 3 r = 1 cellular autometer we're considering there are 26 that's 3 Cub minus one relevant cases in the rule there'd be 27 if we didn't insist that the all white sequence of rules goes to all a single a white cell so point mutations affect a single case in the rule changing it to one of two I.E 3 - one possible alternative outcomes so they're all together 52 which is 26 * 2 possible distinct point mutations that can be made to a given rule so for example if we start from this rule the result of possible single point mutations are these and the that's a sequence that goes on and even with such point mutations there's usual usually considerable diversity in the behavior they generate so here's here are the results for all of the point mutations in this case in quite a few cases the the uh um the pattern generator is exactly the same as the one through the original rule in other cases it dies out more quickly or doesn't die out at all either become per becoming periodic or growing forever and in this particular example in just one case it achieves higher thickness surviving longer so that's that case right there so if we make a sequence of random mutations many will produce shorter lived or infinite lifetime kind of tumor patterns and these will reject or in biological terms we can imagine they're selected out but there is still plenty of the the um there's still there can still be many kind of neutral mutations that don't change the final pattern or at least give a pattern of the same length and at first we might think that these don't achieve anything but actually they're critical in allowing single point mutations to build up to larger mutations that can eventually give Long Live patterns so that's a sequence of those those mutations kind of at first not making a change to the phenotype the pattern that's generated but changing the genotype and eventually changing the phenotype as well so tracing our whole adaptive Evolution proc process like the one we were saw at the beginning here the total number of point mutations involved in getting from one increasingly long lived Fitness Waypoint to another is what we see here so and uh we can then look at the kind of underlying rules associated with these Fitness waypoints and the numbers here just count the number of cumulative accepted mutations ignoring ones that kind of just go back and forth so one can get a sense of what's going on with the whole sort of sequence of accepted M mutations accepted rules in this adaptive evolution by plotting them in a kind of Dimension reduced rendering of this uh 27 dimensional rule space so here's a a plot of that and there are periods when there's sort of a lot of just wandering around going on with many mutations needed to make any progress and then there are other periods when things go much faster and fewer mutations are needed well as another way to plot see what's going on we can plot kind of the maximum lifetime achieved so far against the total number of mutation steps made so we see lots of plateaus extreme including extremely long one in which no progress is made punctuated by sometimes quite large sudden changes often brought on by just a single mutation so if we include rejected mutations uh the um uh we can see there's there's lots of activity going on um even in the plateaus every Red Dot is a is the result of a mutation it doesn't just manage to make progress in a sense we can think of any Red Dot that lies below a plateau as being like a mutation or for that matter an organism that doesn't make it and is selected out so it's worth noting that there can be multiple different kind of phenotype patterns that occur across one of these plateaus um here's what we see in this particular example so and and what we're seeing there is is the in these different blocks are the different phenotypes that occur across equal Fitness plateaus but even between these phenotypically different cases there can be many genotypically different rules in a sense this isn't surprising because usually only parts of the underlying rule are coded other parts are non-coding in the sense they're not sampled during the generation of the pattern from that rule and for example here's a a picture that um highlights uh go um for each Fitness Waypoint rule which cells make use of a fresh case in the rule that hasn't already been so far sampled during the generation of the pattern so every kind of highlighted cell here is a cell that makes use of a new case in the underlying rule that generates the final pattern so we see that even in the last rule shown here only 18 of the 26 relevant cases in the rule are actually ever sample during the generation of the pattern from at least from the particular single red cell initial condition that's used so this means that eight cases in the rule are undetermined from the phenotype implying that there are 3 to the8 or 6561 possible genotypes I.E rules that will give the same result as that final result here okay well so far we've mostly been talking about one particular random sequence of mutations but what happens if we look at many possible such sequences so here's uh how the longest lifetime or in effect Fitness increases for 100 different sequences of random mutations and what's perhaps most notable here is that it seems as if these adaptive processes indeed don't get stuck it may take a while was the result that there long plateaus but these pictures suggest that eventually adaptive Evolution will find a way and one will get to rules that show longer lifetimes so we can kind of see another reflection of that if we look at the progressive development of the distribution of lifetimes as a function of the number of of adaptive Evolution steps we see that we get this histogram which is heading to the right to longer lifetimes and away from the left which corresponds to short lifetimes okay let's go on let's talk about the multi-way graph of all possible mutation histories so in what we've done so far we've always been talking about particular Paths of adaptive Evolution determined by particular sequences of random mutations but a powerful way to get a more global view of the process of adaptive evolution is to look in the spirit of our physics project the ruad and so on not just at individual Paths of adaptive Evolution but instead at the multi-way graph of all possible paths and in making a correspondence with Biology multi-way graphs give us a way to talk about adaptive Evolution not just of individual sequences of organisms but also populations so to start our discussion let's consider not the three-c color cellular autometer that we just talked about before but instead nearest neighbor twocc color cellular autometer for which there are just 128 possible relevant rules and uh how are these rules related by point mutations well we can construct a graph of every possible way that one rule from the set can be transformed to another by a single point mutation here's that graph so if we imagine 5 bit r other than these seven bit rules there are only 16 possible ones and we can readily see that the graph of possible mutations has the form of a Boolean hyper Cube well let's say we start from the null rule that's the the rule which is all white down in the bottom right hand corner there then we enumerate the rules obtained by a single point mutation and therefore directly connected to the null rule in this graph we see what Behavior they can produce and uh uh say from a particular initial condition that the result so the the the caption to those Evolutions in each case is the sequence of cases the sequence of bits that represent the rule some of these rules we can view as making progress in the sense that they yield patterns with longer lifetimes not impressible longer just two rather than one but other rules make no progress or generate patterns that live forever so keeping only mutations that don't lead to Shorter or infinite lifetimes we can construct a multi-way graph that shows all possible mutation paths and here's that graph so although there's a very small graph with just 15 rules appearing we can already see hints of some important phenomena there are kind of Fitness neutral Fitness neutral mutations that go both ways but there are also plenty of mutations that go one way because the other way they would decrease Fitness and a notable feature of this graph is that once one's committed to a particular part of the graph say the left or the right one often can't reach a different one suggesting an analogy to the existence of different branches in the tree of life so moving beyond two color nearest neighbor that is k = 2 Ral 1 cellular autometer we can consider k = 2 R equals three halves ones so a typical such cellular automaton is this one here and for k 2 Ral one there are a total of 128 2 2 3 - one relevant rules for = 2 R = 3 halves they're a total of 32,768 or 2 2 4 - 1 rules so starting from the null Rule and again using uh particular initial condition here are a few specific examples of uh uh Evolution paths that we get um uh for cell autometer of this type and now we can look at here's the here's the beginning of the multi-way graph off for uh look quite right because we can't see some gray outlines but it's okay we get the basic Point um uh this is the beginning of the multiway graph for k = 2 Ral 3 halves rules showing rules reached by up to two mutations one level two levels from the original null rule so this graph contains many examples of Fitness neutral sets rules that have the same Fitness and that can be transformed into each other by mutations back and forth so a few examples of these Fitness neutral sets are these ones here and in the in the first case here the morphology of the phenotypic patterns is the same for all genotypic rules in the fitness neutral set but in the other cases there are multiple morphologies within a single Fitness neutral set if we included all individual rules we get a complete k equal 2 3 Ral 3 Hales multi-way graph with a total of 1, 1884 nodes but if we include just one representative from every Fitness neutral set we get a more manageable multi-way graph uh here with uh 86 possible nodes so keeping only one representative from pairs of patterns that are related by Left Right symmetry we get a still simpler graph now with a total of 49 nodes and there's quite a lot of structure in this graph with both Divergence and convergence of possible paths but overall there's a certain sense that different sections of the graph separate into distinct branches in which adaptive evolution in effect pursues different ideas about how to increase fat Fitness I.E in this case lifetime of patterns so we can think of these Fitness neutral sets as representing we think of the fitness neutral sets that appear here as representing a kind of equivalence class of rules there's quite a range of possible structures to these sets from ones with a single element to ones with Mele elements but few distinct morphologies to ones with different morphologies for every element and so here's a representation of all the fitness neutral sets um for uh k = 2 Ral 3 Hales cellular autometer and the little graphs on the left show the kind of relationship between those elements in the fitness neutral set and the on the right are the morphologically different uh the um examples of phenotypes in that Fitness neutral set what about larger spaces of rules for k equals 2 R equals 2 that's nearest neighbors uh on both sides they're altogether about 2 billion or 2 2 5 minus one relevant rules but if we choose to look only at Left Right symmetric ones this number is reduced to 5242 88 or 2019 and uh let's look at here are some examples of sequences of rules produced by adaptive evolution in this particular case of symmetric rules starting from the null Rule and allowing only mutations that preserve Symmetry and now we're again just using a single black cell as the initial condition to grow our patterns so once again we can identify Fitness neutral sets though this time in the vast majority of cases is the patterns generated by all members of a given set are morphologically the same and here are some Fitness neutral sets in this case if we reduce out the fitness neutral sets we can then compute kind of the complete Transit reduced multi-way graph for symmetric k = 2 Ral 2 rules containing a total of 60 nodes and uh oops here's the result by transitive reduction I mean that there are also cases where there are uh mutations that lead directly from a uh a phenotype high on this picture to one lower on the picture but we're only showing the the kind of the slowest path to make that transition rather than showing all the possible paths in this particular case so by reducing out fitness neutral sets we're creating a multiway graph in which each Edge represents a mutation that makes progress in increasing Fitness but actual Paths of adaptive Evolution based on random sequences and mutations can do any amount of rattling around within Fitness neutral sets not to mention trying mutations that decrease Fitness before reaching mutations that make progress so this means that even though the reduced multi-way graph we've drawn suggests that the minimum number of steps I.E mutations needed to adaptably evolve from the null rule to any other rule is just nine steps it can actually take any number of steps because of the rattling around within Fitness neutral sets so here's an example of a sequence of accepted mutations in uh a particular um adaptive Evolution process with mutations that make progress highlighted and the number indicating numbers between indicating rejected mutations so we can sort of see rattling around in Fitness neutral sets with a cycle morphologies being generated but because we can we can see that there are morphologies here that are that are the same or that kind of cycle around but while this represents one way to reach the final pattern there are also plenty of others potentially involving many fewer mutations and indeed one can determine from the multi-way graph that the absolutely shortest path to reach that end point is this one here and this involves this sequence of rules so we're starting from the rule here and at each step we're making a single point mutation though because of symmetry two bits can sometimes be changed the first mutations don't end up changing the phenotypic behavior remember it was looked like this um but uh after a while enough mutations here actually six have built up that we get morphologically different behavior and after just three more mutations we end up with our final pattern so our original random sequence of mutations gets to the same result but in a much more tortuous way doing a total of 169 mutations which often cancel each other out so in drawing a multi-way graph we're defining what evolutionary paths are possible but what about probabilities if we assume that every point mutation is equally likely then we can in effect analyze the flow in the multi-way graph and determine the ultimate probability that each rule will be generated with here higher probabilities for Generation showing redder actually not so obvious in in this projected image that there are some Reds are redder than other Reds here all right well let's go on let's talk about the fitness landscape so multi-way graphs like this on here give a very global view of adaptive Evolution but in understanding the process of adaptive Evolution it's also often useful to think some somewhat more locally we can imagine that all the possible rules are laid out in a certain space and that adaptive evolution is trying to find appropriate paths in that space potentially we can suppose that there's a fitness landscape defined in the space and that adaptive evolution is trying to follow a path that progressively ascends to higher peaks of Fitness so so let's consider again the very first example that I gave here of adaptive evolution in the space of three color cellular autometer at each step so just a reminder what that looked like that's this case here uh so at each step in this adaptive Evolution there are 52 possible point mutations that can be made to the rule we can think of each of these mutations as corresponding to making an elementary move in a different direction in the 26 dimensional space of rules so here's kind of a A visual representation of what's going on uh based on the particular path of adaptive Evolution from the example that that I just gave what we're showing here is the effect in effect the sequence of decisions that are being made to get us from One Fitness Waypoint to another so different possible mutations are represented by different radial directions with the length of each line being proportional to the fitness achieved by doing that mutation so at each step there's kind of a a gray disc that you can't see very well here that represents the previous Fitness it's kind of where most of the um uh the the the lines are going out to more or less the position of that gray disc um or um and uh many possible mutations lead to lower Fitness outcomes which are effectively within the dis but there are at least some mutations that have higher Fitness and escape the dis so in the multi-way graph we Trace every mutation that leads to higher Fitness but for a particular path of adaptive Evolution that we're talking about here we imagine that we always just pick a random one mutation from this set that's indicated here by a red line later we'll talk about different strategies for for the the Adaptive Evolution process so these kind of radial icons can be thought of as giving a representation of the local derivative at each point in the space of rules with longer lines corresponding to directions with larger slopes up the fitness landscape but what happens if we want to niit together these local derivatives to form a picture of the whole Space needless to say it's complicated as a first example consider just k = 2 Ral 1 the elementary cellular automet rules there are a total of 128 relevant such rules um that can be thought of as connected by point mutations that form a seven-dimensional Boolean hypercube and uh uh we already talked about the fact that of 128 relevant rules only 15 appear in adaptive Evolution processes the others are in effect never selected because they represent lower Fitness but now we can ask where the rules that do appear lie on the whole hypercube and this is a picture of that so each node here represents a rule with the size of the highlighted nodes indicating their corresponding Fitness computed from Lifetime with a particular initial condition so the node that's shown in green corresponds to the null rule so rendering this in 3D we get um uh and and showing Fitness as height as well as size of of sphere we get what we can consider a fitness landscape and now we can think of our adaptive evolution is proceeding along paths that never go to nodes with lower height on this landscape so if we look at uh k = 2 Ral 3 Hales rules we get a more filled in landscape um this is what that looks like an Adaptive Evolution must trace and never go down path like this one on this landscape so along this path we can make kind of derivative pictures like the ones that we did before to represent local topography around each point indicating which of the possible upwards on the landscape directions is taken so this is a picture of of that so the rule Spates over which our fitness landscape is defined is ultimately discreet and effectively very high dimensional 15 dimensional for this case of K through 2 Ral 3al rules it's quite challenging to produce an interpretable version visualization of that in 3D Well we'd like it if we could lay out our rendering of the rule space so that rules which differ by just one mutation are a fixed kind of Elementary 2D distance apart in general this won't be possible but we're trying to at least approximate this by finding sort of a good layout from the underlying mutation graph and that's what we were doing uh in a in a case like this but using this layout here we can in principle make a fitness landscape Surface by interpolating Between the discrete points that we show here it's not completely clear how meaningful this is but perhaps useful in kind of engaging our spatial intuition to be able to see things like this we can also try machine learning and dimension reduction operating on a set of rule vectors IE outcome lists that won't be rejected in our adaptive Evolution process and the results are well perhaps slightly better by the way if we use this dimensional reduction for the rule space Here's how the behavior of the rules lays out lays out and for for comparison here's a feature space plot just based on the visual appearance of these patterns so the first one was was based on actual mutations the second one is simply based on visual appearance well let's talk about the whole Space exhaustive search versus adaptive evolution in adaptive Evolution we start say from the null Rule and then we make random mutations to try to reach rules with progressively larger Fitness but what about just exhaustively searching the complete space of possible rules well the number of rules in increases uh rapidly becoming unmanageably large but some cases are definitely accessible for example there are just 524,000 ultimately though there are just 77 distinct phenotypic patterns that appear with varying lifetimes and varying multiplicity so in this this case we're showing uh the the multiplicity in in is actually which is the numbers underneath here are are actually just directly associated with the number of unused bits that in the rule when we generate the patterns that we're showing here so the um and for example there are no unused bits in the case where there's multiplicity one uh for these those those patterns so how do these exhaustive results which we get by sampling the whole Space of possible rules compare with what's generated in the multi-way graph from adaptive Evolution well they're almost the same in this particular case but with two extra cases in the in the exhaustive case which are generated by rules of the following form where you can't see it but there's a lot of uh Gray cases in these rules um which are entries that don't matter why don't rules like this ever appear in our adaptive Evolution the reason is that there isn't a chain of point mutations starting from the null rule that can reach these rules without going through rules that will be rejected by our adaptive Evolution process if we draw a multi-way graph that includes every possible acceptable rule then we'll see a separate part in the graph with its own root that contains the rules that can't be reached by our adaptive Evolution from the null Rule and here's how that works and you can see those those unreachable rules with with uh the red connections there so now if we look at all symmetric k 2 ral2 rules let's look at just the distribution of lifetimes we get here it is the maximum we saw above is 65 the overall distribution follows roughly a power law with exponent around minus 3 so what we saw above is that not all rules make use of all their bits by outcomes in generating a particular phenotypic pattern and what we see is that the larger the lifetime that's achieved the more bits tend to be used so this is the number of bits used as a function of the lifetime that's achieved in a sense this is n surprising because as we'll discuss later we can expect to need more bits in the program to specify more elaborate Behavior or in particular behavior that embodies a larger lifetime larger number for lifetime so what about General I.E not necessarily symmetric k 2 ral2 rules there about there are 2 to 31 or two billion of these and if we exhaustively search through them we find about 75 million that have finite lifetime and the distribution of lifetimes in that case is is this and again it's roughly a power law Now with an exponent around minus 3.5 and so what do the actual patterns look like so here are the the actual 100 patterns which have the longest lifetimes and in all the asymmetric cases here there are also rules that give giving uh Left Right flipped patterns so the numbers underneath in in each case here are the lifetime achieved with that particular pattern and it's it's really interesting to see the variety of qualitatively different ideas being used by different rules some of them like uh see this this one ah [Music] like like that one there um we might somehow imagine could have been constructed specifically for the purpose of having their particular long lifetimes but other ones like this long one here with lifetime 308 somehow seem more coincidental behaving in an apparently random way and then just happening to die out after a certain number of steps well since we found these rules by exhaustive search we know they're the only possible ones with such long lifetimes at least with k equal 2 r equal 2 so then we can infer that the ornate structures we see are in some sense necessary to achieve the objective of say having a finite lifetime of more than 100 steps so that means that if we go through a process of adaptive Evolution and achieve a lifetime above 100 steps we see a complex pattern of behavior not because of complicated choices in our process of adaptive Evolution but rather because to achieve such a lifetime one has no choice but to use such a complex pattern or in other words the complexity we see is a reflection of computational necessity not historical accidents of adaptive Evolution so note also that as we'll talk about more more detail later there are certain behaviors we can get and others that we cannot so for example there's a rule that gives lifetime 308 but none that gives lifetime 300 though if we use more complicated initial conditions or a more complicated family of rules we absolutely could find such a rule so much as we saw in the symmetric k 2 Ral 2 case almost any long lifetimes require using all the available bits in the rule and this is showing as a function of Lifetime the number of bits should have been labeled lifetime here um the number of bits uh available uh number of available bit um how many of the available bits in the rule have been used and mostly As we got long lifetimes all 32 bits in the rule have been used but needless to say as is always the case with these adventures in the computational universe there's an exception a pair of rules with lifetime 84 where the outcome for the all black five black cell case doesn't matter um and that's this is the uh particular rule case there and the uh the first cell here is is supposed to be gray indicating that it doesn't matter what its value is these cells remember represent the outcomes for the succession of possible five cell neighborhoods in the cellular automaton but okay so can these long lifetime rules be reached by single mutation adaptive Evolution from the null rule rather than trying to construct the whole multi-way graph for the general k 2 Ral 2 case which might have billions of of nodes starting from the N rule we can instead construct what's in effect an inverse multi-way graph in which we start from a given rule then successively make all single point mutations that reach rules with progressively shorter lifetimes and uh that's the result of doing that what we see is at least in this case this procedure never reaches the null rule the furthest it gets is to Lifetime two rules uh and among those rules the closest to the null rule are these rules here but it turns out there's no way to reach these two bit Rules by a single point mutation from any of the 261 bit rules that aren't rejected by our adaptive Evolution process in fact this isn't just an issue for this particular long lifetime rule it's something quite General among equals 2 rules and indeed constructing the sort of the forward multi-way graph starting from the null rule we find we can only ever reach uh lifetime one rules so ultimately this's a particular feature of rules with just two colors it's specific to starting with something like the null rule that has lifetime one but it's an illustration of the fact that there can be even large swats of rule space that can't be reached by adaptive Evolution with point mutations well what about symmetric k = 2 ral2 rule well to maintain symmetry we have to deal with mutations that change not just one but two bits and this turns out to mean that except in the cases we talked about before the inverse multi-way system starting from long lifetime rules always successfully reaches the null Rule and here's an example so there's something else to notice here looking at this graph we see that there's a way to get just one 2bit mutation for with just one two bit mutation from a lifetime 1 to a lifetime 65 Rule and that's that's the uh uh that's the the line over on the left starting from the the six length 65 pattern at the top and this is the mutation that's involved and we we're going that first rule generates the the uh lives one step uh phenotype that we showed on the left here and the second rule generates the live 65 steps phenotype on the right we didn't see this in the multi-way graph that we made above because we' done this process of adap of trans of transitive reduction to the graph so we had gotten rid of these kind of direct paths rather than the paths that have to go through all the intermediate steps if we didn't do that we find that a few large lifetime jumps are possible and we can see that if we plot the uh possible lifetimes before and after a single point mutation so the cases where it's the same lifetime after as before would be on the the line uh xal y here and the ones we can see that there are some which have a short lifetime before but so a small x value and jump to high Y value and and show up at high yv value and those corresp into cases where there was a short lifetime before the mutation and a long very long lifetime after the mutation so if we go beyond k = 2 ral2 rules we can consider symmetric K 3 Ral 1 rules of which they're about there are 3 to 17 or about 129 million and the distribution of lifetimes in this case is like this and uh again that's roughly a power lower again with exponent around minus 3.5 and uh now the maximum lifetime we find is not 308 which is what we got for the symmetric k = 2 R equals two rules um but uh um instead 2,194 and here are um some examples of the long lived uh rules of this type and uh let's see we've got even longer ones uh here and the champion here is the one on the right 2,194 steps that it lives before dying out so once again there are different ideas on display here with a few curious examples of convergence like we can see in this case here there are actually two rules with um lifetimes 989 and uh uh 990 on the bottom row there um as well as on the uh as well as um uh 1068 and 1069 which give essentially the same patterns but if you look carefully they've exchanged colors and they have one prefatory step at the very beginning that's different but otherwise these have sort of converged to the same phenotype so what about general not symmetrical k equal 3 3 color r equal 1 nearest labor rules there are too many of these to search exhaustively but if we do kind of directed random sampling we get lots of long lifetime examples let me show you some of those here are here are some of those um and uh uh we see once again these very different ideas on display about sort of how to live a long time now the tale of very long lifetimes in this case extends even further um here are examples of what happens in that case and uh it's um it's a little easier to see the kind of champion here is a is a lifetime 10,863 step rule it's a little easier to see what happens if we just visualize that in sections and uh uh we can if we just sample 100 steps out of every 2,000 as well as piece at the end we see sort of an elaborate alternation in this rule between periodic and seemingly random Behavior but none of it gives any obvious clue of the remarkable fact that after 10,863 steps the whole pattern will die out and we see its very end down there on the bottom right the issue of undecidability so as our example Criterion for the fitness of cellular automatan rules we've used the lifetimes of the patterns they generate or is assuming that if the patterns don't terminate at all they should be considered to have Fitness zero but how can we tell if a pattern is going to terminate and what we saw before we saw patterns that live a very long time but do eventually terminate so here are examples of the first 100 steps of a few K = 3 Ral 1 symmetric rules and uh what will happen with these patterns we know from what we see here that none of them have lifetimes less than 100 steps but what would allow us to say more in a few cases we can see that the patterns are periodic or have obvious repeating structures which means they'll never terminate but in other cases there's no obvious way to predict what will happen explicitly running the rules for another 100 steps we discover some more outcomes going to 500 steps though there are some surprises the first rule rule a here becomes periodic after 388 steps um other other rules here on on the bottom row terminate after respectively 265 and 377 steps we kind of didn't see that coming from the behavior we saw above but the question is is there a way to systematically say what will happen in the end with all the remaining rules the answer is that in general there's not it's not something it's something that must be considered undecidable by any finite computation so given how comparatively simple the cellular Automan rules we're considering are we might have assumed that with all our sort of sophisticated mathematical and computational methods we'd always be able to jump ahead of them and figure out their outcome without the computational effort of explicitly running each step but the principle of computational equivalence suggests that pretty much whenever the behavior of these rules isn't obviously simple it will effect be of equal computational sophistication to any other system and in particular to any methods that we might use to predict it and the result is the phenomenon of computational irreducibility that I talk about a lot that implies that in many systems presumably including most of the cellular autometer here there isn't any way to figure out their outcome much more efficiently than by explicitly tracing each of their steps so this means that to know what will happen in the end after an infinite number of steps might take an unlimited amount of computational effort or in other words it must be considered effectively undecidable by any finite computation so as a practical matter we might look at the observed distribution of lifetimes for a particular type of cellular automaton and become pretty confident that there won't be longer finite lifetimes for that type of cellular automaton but for the k equal 3 R equals 1 rules we talked about before we might have been fairly confident that a few thousand steps was the longest lifetime that would ever occur until we discovered the 10, 863 step example so let's say we run a particular rule for 10,000 steps and hasn't died out how can we tell if it never will well we have to construct a proof of some kind and that's easy to do if we can see that the pattern becomes say completely periodic but in general computational irreducibility implies that we won't be bble to do it might there though still be special cases where we can in effect those would have to correspond to pockets of computational reusability where we can manage to find a compressed description of the cellular automon Behavior there are uh cases like this where there isn't strict periodicity but where in the end there's basically repetitive Behavior U this is particular rule which has effect of period of 480 and there are cases uh of nested Behavior which is never periodic but which is nevertheless simple enough to be predictable but there also there always surprises like this example here that eventually resolves to have period six but only after 7,129 steps so what does all this mean for our adaptive Evolution process it implies that in principle we could miss a very long lifetime for a particular rule assuming it to be infinite in a biological analys analogy we might have a genome that leads to unbounded perhaps tumorlike growth but where actually the growth in the end unexpectedly stops computation theoretic perspectives and Busy Beavers what we're asking about the dying out of patterns in cellular autometer is directly analogous to the classic halting problem for touring machines or the termination problem for term writing post tag systems and so on and in looking for cellular autometer that have the longest live patterns we are studying a cellular automatan analog of the so-called busy bav problem for touring machines we can summarize the results that we've got so far for our cellular autometer uh here all for single cell initial conditions and if we look at kind of the profiles the widths of non-zero cells for the patterns generated by these these uh these winning rules um here's the results that we get and essentially the integrals of these curves are what we've listed as as what we listed as areas back here well we can the reasons we were talking about just now we can only be certain that we found lower bounds on the actual maximum lifetime though except in the last few cases uh we looked at here it seems very likely that we do in fact have the maximum lifetime it's somewhat sobering though to compare with known results for maximum Busy Beaver lifetimes for touring machines me show you those um results here uh where now s is a number of touring machine States and uh the touring machines are started from blank tapes and they're taken to Halt when they reach a particular halt state sufficiently small touring machines can have only modest lifetimes but even slightly bigger touring machines can have vastly larger lifetimes and in fact it's a consequence of the undecidability of the halting problem for touring machines that the maximum lifetime grows with the size of the touring machine faster than any computable function I.E any function that can be commed in finite Time by a touring machine or whose values can be proved by or whose value can be proved by finite proof in a finite axium system so you see some very big lifetimes there like 10 to the 10 to the 10 to the 10 15 times and so on but okay the maximum Life Time increases with the size of the rule for a touring machine or a cellular automaton but what defines the size of a rule presumably it should be roughly the number of independent bits needed to specify the rule which we can think of as an approximate measure of its information content or something like log to the base two of the number of possible rules of its type so at the outset we might imagine that all for example 2 to 32 K2 ral2 rules would need 32 bits to specify them but as we already talked about in some cases some of the bits in the rule don't matter when it comes to determining the patterns they produce and what we see is that the more bits that matter and so have to be specified the longer lifetimes are possible we saw kind of a version of this picture before well so far we've been discussing just cellular autometer with single cell initial conditions we use more complicated initial conditions what effectively doing is adding more information content into the system with the result that the maximum lifetimes can potentially get larger and as an example here are the possible maximum lifetimes for k 2 R 3 halves rules with the sequence of possible initial conditions as the initial conditions kind of get larger lifetimes get progressively slightly larger probabilistic approximations so cellular autometer are they cor deterministic systems given a particular cellular automatan Rule and a particular initial condition every aspect of the behavior that is generated is completely determined but is there any way that we can approximate this Behavior by some probabilistic model or might we at least usefully be able to use such a model if we look at the aggregate properties of large numbers of different rules one hint along these lines comes from the power law distributions that we found above for the frequencies of different possible lifetimes for cellular autometer of given types and we might wonder whether such distributions and perhaps even their exponents like we found minus 3us 3.5 could be found from some probabilistic model one possible approach is to approximate a cellular automaton by a probabilistic process say one in which a particular cell becomes black with probability P if either it or its neighbors were black on the step before so these are a few examples of what can happen with this what's called directed percolation setup so the behavior varies greatly with the probability P for small P everything dies out while for larger P it fills in and if we look at the final density starting from random initial conditions there's a a sharp phase transis around uh P equals 0.54 if we instead start not from random initial conditions but from a single black initial cell one sees a slightly different transition one can also plot the probabilities for different survival times or lifetimes for this pattern and right around the transis the distribution of lifetimes follows a power law actually it's roughly minus one power law which happens to be what one gets from the mean field Theory estimate in this case so how does this relate to cellometer let's say we have a k equals 2 Rule and we suppose that the colors of cells can be approximated as somehow random then we might suppose that the patterns we get could be like in our probalistic model and a potential source for the value of P to use will be the fraction of cases in the rule that give a black sell's output so if we plot the lifetimes of k = 2 Ral 2 rules against these fractions we see here that the longest lifetimes do occur when a little under half the outcomes of black though notice this is also where the binomial distribution implies the largest number of rules are concentrated well if we don't try thinking about the details of cellular automet and evolution but instead just consider the boundaries of finite Life Time patterns that we generate we can imagine approximating these uh say for symmetric rules just by random walks um that when they Collide correspond to the pattern dying out well then it turns out the standard theory of random walks tells us that the probabilties to survive for to steps is approximately proportional to to the minus three halves for large tow uh it's a power law though not immediately one of the same ones that we've observed for our cellular aom and lifetimes other adaptive Evolution strategies so what we've done so far we've always taken each step of our process of adaptive Evolution to pick an outcome of equal or greater Fitness what if we adopt a more impatient procedure in which uh at each step we insist on an outcome that has strictly greater Fitness but k equals 2 it's simply not possible with this procedure at least with a null initial condition to escape the null rule everything that can be reached with one mutation still has lifetime one with k equals 3 it's possible to go one step not very excitingly captured by this picture here uh but if we're we're assuming that we have to reach greater Fitness with just one mutation but what if we allow two mutations at a time well then we can make progress is the multi-way system in this case for k = 2 um uh Ral 2 rules showing the number of mutations that we're making on on each uh on each Uh u connection here I should have said that well we don't reach as many phenotypic patterns as by using single mutations and allowing Fitness neutral moves but where we do get we get much quicker without any back and forth in Fitness neutral spaces if we allow uh up to three mutations we get um still further and indeed we seem to get a pretty good representative sampling of what's out there in this rule space even though we reached only 37 rules compared to the 7 77,6 24 albeit with many dup duplicated phenotypic patterns from our standard approach allowing neutral moves for K = 3 Ral 1 symmetric rules single mutations can only get two steps but if we allow up to two mutations we can go much further and uh um and the fact that we don't now have to deal with neutral moves means we can explicitly construct at least the first few steps of the multi-way graph in this case uh we can go further if at each step we just pick a random higher Fitness rule reached with two or fewer mutations here's an example of of how that works the Adaptive Evolution histories here can be generated in effect by randomly trying a series of possibilities at each step than picking the first one that exhibits increased Fitness another approach is to use what amounts to local exhaustive search at each step look at results from all possible mutations and pick one that gives the largest Fitness at least in smaller rule spaces it's common that there will be several results with the same Fitness and as an example we can just pick at random uh between these so one might think that this approach would in fact always be an optimization of the Adaptive Evolution process but in practice the systematic character of looking around at all possible mutations can end up making get stuck in a sense repeatedly just trying to do the same thing even if it isn't working so something of an opposite approach involves loosening our criteria for which paths can be chosen and for example allowing paths that temporarily reduce Fitness uh Say by uh by one step of lifetime so those are examples of such paths so in effect here we're allowing less than Maxim fit organisms to survive and we can represent the overall structure of uh what's happening by a multi-way graph which now includes backtracking to lower fitnesses but although the details are different in the end it doesn't seem as if allowing this kind of backtracking has any dramatic effect somehow the basic phenomena around the process of adaptive Evolution are strong enough that most of the details of how the Adaptive evolution is is done don't ultimately matter much and aside sex will production so in everything we've done so far we've been making mutations only to individual rules but there's another mechanism that exists in many biological organisms sexual reproduction in which in effect a pair of rules I.E genomes get mixed to produce a new rule as a simple model of the crossover that typically happens with actual genomes we can take two rules and splice together the beginning of one rule with the end of another like this in general there will be many ways to compare pairs combine pairs of rules like this in a direct analogy to our physics project we can represent such recombinations as events that take two rules and produce one so the analog of our multi-way graph for all possible Paths of adaptive evolution by mutations is now what we call in our physics project a token event graph so in dealing with just with mutations we were able to take a single rule and progressively modify it now we always have to work with a population of rules rules combining them two at a time to generate new rules we can represent the possible combinations among one set of rules uh in a picture like this and there at this point many different choices that we could make about how to set up our model the particular approach we'll use is takes just n of the N choose two or n * nus 1/ two possible combinations so in this case we might take just these possible combinations then for each of these Ed com combinations we attempt a crossover keeping those children which we can draw here between their parents that are not rejected as a result of having lower Fitness finally to maintain our G pool we can carry forward parents selected at random so that we still end up with n rules in all cases and yes even though we've attempted to make this whole procedure as clean as possible it's still a mess which seems to be inevitable and which has as we'll discuss later bedeviled essentially computational studies of evolution in the past all the time okay so what happens when we apply this procedure the sexual reproduction procedure say 2 K = 3 Ral 1 rules we'll pick four rules at random as our initial population and yes two happen to produce the same pattern here then in a sequence of steps we'll successively pick uh various combinations of these rules and here are the distinct phenotype patterns produced in this process so note that even though there can be multiple copies of the same phenotype pattern the underlying genotype patterns are always distinct well as a final form of summarization of what's going on here we can just plot the successive fitnesses of the patterns we generate with the size of each dot reflecting the number of times a particular Fitness occurs so this is life the the number of uh rounds of evolution along the bottom and up uh on the vertical axis is the fitness we probably should label these so in this case we reach a steady state after nine steps the larger the population the longer the Adaptive Evolution will typically keep going so here are some examples uh with population 10 showing sort of all the patterns that are achieved and if we just show in each case the longest lifetime rule we get so far here's what we see in those two different examples the results aren't obviously different from what we were finding with mutation above even though now we've got a much more complicated model with a whole population of rules rather than a single rule one obvious difference though is that here we can end up with overall cycles of populations of rules whereas in the pure mutation case that can only happen among Fitness neutral rules so here are a few other examples um now obtained after 500 steps with population 25 and uh with population 50 um and as far as one can tell even here there are no substantial differences from what we saw with mutation alone certainly there are detailed features introduced by sexual reproduction and crossover but for our purposes and understanding the big picture of what's happening in adaptive Evolution it seems sufficient to do what we've done so far and consider only mutation an even more minimal model so by investigating adaptive evolution in cellular autometer we're already making dramatic simplifications relative say to actual biology but in the effort to understand the essence of phenomena we see it's helpful to go even further and instead of thinking about computational rules and their behavior just think about vertices on a mutation graph each assigned a certain Fitness so as an example we can set up a 2d grid uh assigning each point a certain random Fitness and then starting from a minimum Fitness point we can follow the same kind of adaptive Evolution procedure as above at uh each point going to a neighboring point with an equal or greater Fitness typically we don't manage to go far before we get stuck though with the uniform distribution of Fitness values that we're using here we still usually end up on a fairly large Fitness value we can summarize the possible pods that one can take here by multi-way graph again and uh in a while these nodes are the same size they should be different sizes in this graph so in our cellular automat and Rule space and for that matter in biology neighboring points don't just have independent random fitnesses instead the fitnesses are determined by a definite computational procedure so as a simple approximation we can just take the fitness of each point to be a particular function of its graph coordinates if the function forms something like a uniform Hill then the Adaptive Evolution procedure will just climb it but as soon as the function has systematic bumpiness there's a tremendous tendency to quickly get stuck and if there's some unexpected spot of high Fitness adaptive Evolution typically won't find it and it certainly won't if it's surrounded by a lower Fitness moat so what happens if we increase the dimensionality of the mutation space in which we're operating basically it becomes easier to find a path that increases Fitness and we can see see this whole phenomenon if we look at Boolean hyper cubes in increasing numbers of Dimensions But ultimately this relies on the fact that in the neighborhood reachable by mutations from a given point there'll be a sufficiently random collection of fit values that it'll likely be possible to find a Direction that's going up in Fitness yet this alone won't in general be enough because we also need it to be the case that there's enough regularity in the fitness landscape that we can systematically navigate it to find its maximum and that the maximum is not somehow unexpected and isolated what can adaptive Evolution achieve we've seen that adaptive Evolution can be surprisingly successful at finding cellular autometer that produce patterns with long but finite lifetimes but what about other types of traits What can and cannot adaptive Evolution ultimately manage to do for example what if we're trying to find cellular automet whose patterns don't just live as long as possible but instead die after a specific number of steps it's clear that within any finite set of rules say with particular k&r there'll only be a limited collection of possible lifetimes for symmetric k = 2 Ral 2 rules the possible lifetimes are basically these ones here but as soon as we're dealing with even K 3 Ral 1 symmetric rules it's already in principle possible for example to get every lifetime up to 100 but what about adaptive Evolution how well does it do at reaching rules with all those lifetimes let's say we do single point mutation as before but now we accept as a a mutation if it leads not specifically to a larger finite lifetime but to a lifetime that is closer in absolute magnitude to some desired lifetime strictly and importantly in both cases we allow Fitness neutral mutations that leave the lifetime the same so here are examples if we try to uh get lifetime exactly 50 in k equal 3 are R equals 1 rules it gets close sometimes it overshoots but at least in these particular examples it never quite makes it here's what we see if we look at the lifetimes achieved with 100 random different sequences of mutations basically they mostly get stuck at lifetimes close to 50 but not exactly 50 it's not that K 3 R equals 1 rules can't yield lifetime 50 exhaustive search shows that even many symmetric such rules can here are examples it's just that our adaptive Evolution process usually gets stuck before it reaches rules like these even though there's usually enough room to maneuver in K 3 Ral 1 space to get to generally longer lifetimes there's not enough to specifically get to Lifetime 50 what about k = 4 I = 1 rule space there are now 10 the not 10 12 but about 10 38 possible rules and in this rule space it becomes quite routine to be able to reach lifetime 50 through adaptive Evolution and uh here are some examples of [Music] that well that's mysterious that doesn't seem to be the right picture see oh yeah there there are a bunch of examp examples where we we reach lifetime 50 directly from uh um with four color Rules by adaptive Evolution it can sometimes take a while um but most of the time in this rule space it's possible to get to exactly lifetime 50 and we can see that here if we just look at the fitness as a function of number of steps what happens with other lifetime goals even symmetric k equal 3 R equals rules can achieve many lifetime values indeed The First missing values are 129 132 139 and so on and for example many multiples of 50 can be achieved here are examples but it becomes increasingly difficult for adaptive Evolution to reach these specific goals increasing the size of the rule space always seems to help so for example with k equal 4 Ral 1 if one's aiming for Lifetime 100 the actual distribution of lifetimes reached is this here peaked very much around 100 in general the distribution gets broader as the lifetime one seeking gets longer um and we saw before that the across the whole Space of say k 4 I 1 rules the frequency of progressively larger lifetimes Falls U actually that should be we should have said k equals 3 R equals own rules the frequency of progressiv in larger lifetimes Falls roughly according to a power law so this means that the fractional region in rule space that achieves a given lifetime gets progressively smaller with the result that typically the paths followed by adaptive Evolution are progressively more likely to get stuck before they reach that particular lifetime okay what about other kinds of objectives say one's more related to the morphologies of patterns as a simple example let's consider the objective of maximizing the widths of finite lifetime patterns we can try to achieve this by adaptive evolution in which we reject any mutations that lead to decreased width but by width what I mean is the maximum horizontal extent of the pattern and once again this process manages to discover all sorts of mechanisms for achieving uh um larger widths here we're labeling every pattern by its height I its lifetime and its width see oh yeah we've got some more High width patterns there well so there are certainly structural complaints constraints here for example the width can't be too large relative to the height because if it's too large patterns will tend to grow forever but what if we try specifically try to select for maximal pattern aspect ratio a ratio of width to height and essentially every case we see there um the Adaptive Evolution has an effect well in in in what we've seen so far adaptive Evolution has invented many different mechanisms for achieving the objective we've set but here it turns out we essentially see the same idea being used over and over again presumably because this is the only way to achieve our objective given the overall structure of how the underlying rules we're using work so what if we ask for something more specific let's say that the aspect ratio be as close to three as possible well much of the time the adap evolution finds a solution that is correct if trivial but sometimes it finds other Solutions like these here um and often surprisingly elaborate and complicated ones what about if our goal is an aspect ratio of around Pi 3.14 turns out that adaptive Evolution can still do rather well here even just with symmetric K = 3 Ral uh one rules there are examples there are some examples we can also ask about properties of the inside of the pattern for example we can ask to maximize the lengths of uniform runs of non-white cells in the center column of the pattern and once again adaptive Evolution can successfully lead us to rules like uh well these random examples here uh where this is large where the where the length in the center column of non-white cells is is as is large so we can go on and get still more details say asking about runs a particular lengths or the presence or number of particular subpatterns and eventually just like when we asked for too long a lifetime we'll find that the cases we're looking for a two sparse an Adaptive Evolution at least in a given rule space won't be able to find them even if exhaustive stch could still identify at least a few examples but just what kinds of objectives or Fitness functions can be handled how well by adaptive Evolution operating for example on the raw material of cellular autometer it's an important question an analog which is also Central to the investigation of machine learning but as of now we don't really have the tools to address it it's somehow reminiscent of asking what kinds of functions can be approximated how well by different methods or basis functions but it's more complicated solving it though would tell us a lot about the reach of adaptive Evolution processes not only for biology but also for machine learning what it means for what's going on in biology how do biological organisms manage to be the way they are with all their complex and seemingly clever solutions to such a wide range of challenges is it just natural selection that does it or is there an effect more going on and if natural selection does it how does it actually manage it the from the point of view of traditional engineering what we see in biology is often very surprising and much more complex and clever than we'd imagine ever being able to create ourselves but is the secret of biology in a sense just natural selection well actually there's often an analog of natural selection going on even in engineering as different designs get tried and only some get selected but at least in traditional engineering a key feature is that one always tries to come up with designs where one can foresee their consequences but biology is different mutations to genomes just happen without any notion that their consequences can be foreseen but one still might assume that when Guided by natural selection the results wouldn't be too different to what we'd get in traditional engineering but there's a crucial piece of intuition missing here it has to do with how seemingly R how randomly chosen programs beave we might have assumed based on our typical experience with programs we explicitly construct for particular purposes that at least a simple random program wouldn't ever do anything terribly interesting or complicated but the surprising discovery I made in the early 1980s is that this isn't true and instead it's a ubiquitous phenomenon that in the computational universe of possible programs one can get immense complexity even from very simple programs so this means that as mutation operates on a genome it it's essentially inevitable that it'll end up sampling programs that show highly complex Behavior at the outset one might have imagined that such complexity could only be achieved by careful design and would inevitably be at best rare but the surprising fact is that because of how things fundamentally work in the computational universe it's instead easy to get but what does complexity have to do with creating successful organisms to create a successful organism that can prosper in a particular environment there fundamentally has to be some way to get a genome that will solve the necessary problems and this is where natural selection comes in but the fact that it can work is something that's not at all obvious there are really two issues the first is whether a program IE gome even exists that will solve the necessary problems and the second is whether such a program can be found by thread of adaptive Evolution that goes only through intermediate steps that uh are mistake there fit enough uh to survive as it turns out both of these issues are related to the same fundamental features of computation which are also responsible for the ubiquitous appearance of complexity given some underlying framework like cellular autometer or like a basic basic apparatus of life is there some rule that can be implemented in that framework that will achieve some particular computational objective the principle of computational equivalence says that generically the answer will be yes in effect given almost any underlying Hardware it'll ultimately be possible to come up with software I.E a rule that achieves almost any physically possible given objective like growing an organism of at least some kind that can survive in a particular environment but how can we actually find a rule that achieves this in principle we could do exhaustive search but that will be exponentially difficult and in all but toy cases will be utterly infeasible in practice so what about adaptive evolution well that's the big question and what we've seen here is that rather surprisingly simple mutation and selection I.E the mechanisms of natural selection very often provide a dramatic shortcut for finding rules that do what we want so why is this in effect adaptive evolution is finding a path through rule space that gets to where we want to go but the surprising part is that it's managing to do this one step at a time it's just trying random variations I.E mutations and as soon as it finds one that's not a step down in Fitness it'll take it and keep going at the outset it's certainly not obvious that this will work in particular it could be that at some point there just won't be any way forward all directions will lead only to lower Fitness in effect the Adaptive Evolution will get stuck but the key observation from the experiments in our simple model here is that this typically doesn't happen and there seem to be basically two things going on here the first is that rule space is in effect very high dimensional so that means that there are many directions to choose from in trying to find one that will allow one to take a step forward but on its own this isn't enough because there could be correlations between these directions that would mean that if one's blocked in One Direction one would inevitably be blocked in all others so why doesn't this happen well it seems to be the result of the fundamental computational phenomenon of computational irreducibility a traditional view based on experience with mathematical science had been that if one knew the underlying rule for a system then this would immediately let one predict what the system system would do but what became clear from my Explorations in the 1980s and 1990s is that a computational universe this generically isn't true and instead that the only way one can systematically find out what most computational systems will do is explicitly to run their rules step by step doing in effect the same irreducible amount of computational work that they do so if one's just presented with behavior from the system one won't be in a position to decode it and see its simpler origin unless one's capable of doing as much computational work as the system itself one will just have to consider what it's doing as more or less random and indeed this seems to be the root of many important phenomena such as the second L of thermodynamics and I suspect it's also at the root of the effectiveness of adaptive Evolution notably in biology but what computational irreducibility implies is that around every point in rule space there'll be there'll be a certain uh effective Randomness to Fitness as one sees and if there are many dimensions to rule space that means it's overwhelmingly likely that there'll be paths to success in some directions from that point but will the Adaptive Evolution find them we've assumed that there are a series of mutations to the rule that uh all happening at random but the point is that if there are n elements in the rule then after some fraction of n mutations we should find Our Success direction if we were doing exhaustive search we'd instead have to to try about K to the N possible rules at the outset it might seem conceivable that the sequence of mutations could somehow cleverly probe the structure of rule space knowing what directions would or would not be successful but the whole point is that going from a rule IE a genotype to Its Behavior IE phenotype is generically a computational irreducible process so assuming that mutations are generated in a computationally bounded way it's inevitable that they can't break computational irreducibility and so we'll experience the Fitness landscape in rule space as effectively random okay but what about achieving the characteristics an organism needs what seems to be critical is that these characteristics are in a sense computationally simple we want an organism to live long enough or be tall enough or whatever it's not that we need the organism to perform some specific computationally irreducible task yes there are all sorts of computational irreducible processes happening in the actual development and behavior of an organism but as far as biological evolution is concerned all that matters is ultimately some computational simple simple measure of Fitness it's as if biological evolution is in the sense of my recent Observer Theory a computationally bounded Observer of underlying computationally irreducible processes and to The Observer what emerges is kind of the simple law of biological evolution and the idea that yes it is possible just by natural selection to successfully generate all sorts of characteristics there are all sorts of consequences of this for thinking about biology for example in thinking about where complexity in biology uh comes from uh is it generated by natural selection perhaps reflecting the complicated sequence of historical accidents embodied in the particular collection of mutations that occurred or is it from somewhere else in the picture we've developed here it's basically from somewhere else because is essentially a reflection of computational irreducibility having said that we should remember that the very possibility of being able to have organisms with such a wide range of different forms and functions is a consequence of the Universal computational Character of their underlying setup which in turn is closely tied to computational irreducibility and it's in effect because natural selection is so coarse in its operation that it is not somehow avoid the ubiquitous computational irreducibility that exists in rule space with the result when we look inside the biological systems we tend to see computational irreducibility and the complexity associated with it something that we've seen over and over again here is that yes adaptive Evolution manages to solve a problem but its solution looks very complex to us there might be some simple engineering solution involving say a very very regular pattern of behavior but that's not what adaptive Evolution finds instead it finds something that to us is typically very very surprising very often an unexpected and clever solution in which lots of pieces fit together just right in a way that I usual understand what's going on engineering practices would never let us invent we might not have expected anything like this to emerge from the simple process of adaptive Evolution but as the models we've here studied highlight it seems to be an inevitable formal consequence of core features of computational systems and as soon as we recognize that biological systems can be viewed as computational then it also comes something inevitable for them and something we can view as as in as as in a sense formally derivable for them at the outset we might not have been able to say what matters in the emergence of complexity in biology but from the models we've studied and the arguments we've made we seem to have quite firmly established that it's a fundamentally computational phenomenon that relies on certain General computational features of biological systems and doesn't depend on the particular their particular detailed components and structure but in the how generic is the complexity that comes out of adaptive evolution in other words if we were just if we were to pick programs say completely at random how different would the complexity they produce be from the complexity we see in programs that have been adaptively evolved for a purpose the answer isn't clear though to know it would provide important foundational input for theoretical Biology one has the general impression that computational reducibility is a strong enough phenomenon that it's the dominant force that determines behavior and produces complexity but there's still usually something a bit different about the patterns we see from Rules we found by adaptive Evolution compared to rules we pick up random often there seems to be a certain level of apparent mechanism the details look complicated and some ways quite random but there seems to be a kind of overall orchestration to what's going on and whenever we can identify such regularities it's a sign of some kind of computational reducibility there's still plenty of computational irreducibility work but High Fitness rules that we find through adaptive Evolution typically seem to exhibit traces of their specialness that manifest in at least some amount of computational irreducibility whenever we manage to come up with a narrative explanation or a natural law for something it's a sign that we found a pocket of computational reducibility if we say that a cellular automaton manages to live long because it generates certain robust geometric patterns or for that matter that an evolution lives an or an organism lives long because it proofreads its DNA we're giving a narrative that's based on computational reducibility and indeed whenever we can successfully identify a mechanism in our cellular automatan Behavior we're in effect seeing computational reducibility but what can we say about the aggregate of a whole collection of mechanisms in a different context I've discussed the concept of a mechanoid phase distinguished say from solids and liquids by the presence of a bulk or ation of underlying components it's something closely related to what I've called class 4 behavior and it's interesting to notice that uh uh if we look for example at rules we found from adaptive Evolution at um uh uh just just here their evolution from random initial conditions that's those same rules mostly shows characteristic class 4 Behavior in other words adaptive evolution is potentially bringing us to characteristically special places in rule space perhaps suggesting that there's something characteristically special about the kinds of structures that are produced in biological systems and if we could find a way to make General statements about that characteristic specialness it would potentially lead us to a framework for constructing a new broad formal Theory and biology correspondence with biological phenomen the models we've studied here are extremely simple in their basic construction at some level it's remarkable that without for example including any biophysics or biochemistry they can get anywhere at all in capturing features of biological systems and biological evolution in a sense this is ultimately a reflection of the fundamentally computational character of biology and the generality of computational phenomena but it's very striking that even the patterns of cellular automatan Behavior we see look very lifelike and organic in actual biology even the shortest genomes are vastly longer than the tiny cellular atomiton rules we've considered but even by the time we're looking at the length 27 genomic sequences in K 3 Ral 1 cellular autometer there are already three trillion possible sequences which seems to be enough to see many core combinatorially driven biological phenomena biology like phenomena the running of a cellular automatan rule might at first seem very far away from the actual processes that create biological organ isms involving as they do things like the construction of proteins and the formation of elaborate functional and spatial structures but there are more analogies than we might at First Imagine for example it's colum for only particular cases in the cellular automaton rule to be used in a given region of the pattern that's formed much as particular genes are partic typically turned on in different tissues in biological organisms and for example the geometrical restriction to a simple 1D array of cells doesn't seem to matter much as as soon as there sophisticated computation going on we still get lots of structures that are actually surprisingly reminiscent of typical patterns of biological growth one of the defining features of biological organisms is their capability for self- reproduction and indeed if it wasn't for this kind of copying there wouldn't be anything like adaptive Evolution to discuss our models don't attempt to derive self- reproduction they just introduce it as something built into the models and although we've considered several variants we basically also just building into to our models the idea of mutations and what we find is that it seems as if point mutations made one at a time are enough to capture basic features of adaptive Evolution we've also primarily considered what amounts to a single lineage in which there's just a single rule or genome at a given step we do mutations and we mostly Implement natural selection just by keeping only rules that lead to patents whose Fitness is no less than we had before if we had a whole population of rules it probably wouldn't be so significant but in the simple setup we're using it turns out to be important that we don't reject Fitness neutral mutations and indeed we've seen many examples where the system wanders around Fitness neutral regions of rule space before finally discovering some Innovation that allows it to increase Fitness the way our models are set up that wandering always involves changes in the genotype but usually at most minor changes in the phenotype so it's very typical to see long periods of apparent equilibrium in which the phenotype changes rather little followed by a jump to a new fitness level and a rather different genotype and this observation seems quite aligned with the phenomenon of punctuated equilibrium often reported in the biological in the fossil record of actual biological evolution another key feature of biological organisms and biological evolution is the formation of distinct species as well as distinct fer and so on and indeed we ubiquitously see something that seems to be directly analogous in our multi-way graphs of all possible Paths of adaptive Evolution typically we see distinct branches forming based on what seem like different mechanisms for achieving Fitness no doubt in actual biology there are all sorts of detailed phenomena related to reproductive or spatial isolation but in our models the core phenomenon that seems to lead to the analog of branching in the tree of life is the existence of distinctly different computational mechanisms in different parts of rule space it's worth noting that at least with our finite rule spaces br ious can die out with no successor species appearing in the multi-way graph and indeed looking at the actual patterns produced by rules in different parts of the multi-ray graph it's easy to imagine morphologically based taxonomic classifications that would be somewhat if not perfectly aligned with a philogenetic tree defined by actual rule mutations by the way at a morphological level we quite often see some level of convergent evolution in our multi-ray graphs in small examples we sometimes also see actual genomic convergence which will typically be astronomically rare in actual biological systems one of the remarkable features of our models is that they allow quite Global investigation of the overall history of adaptive evolution in many of the simple cases we've discussed the rule space we're using is small enough that in a comparatively modest number of mutation steps we get to the highest Fitness we can reach but as the examples we saw about touring machines suggest expand ing the size of the rules we're using even just a little can be expected to be sufficient to allow us to get astronomically further and the further we go the more mechanisms will be invented it's an inevitable feature of systems that involve computational reducibility that there are new and unpredictable things that will go on showing up forever with new pockets of computational along with new pockets of computational reducibility so even after a few billion years and the trillion generations and 10 to the 40th organisms are or so that have ever lived there's still infinitely further for biological evolution to go and more and more branches to be initiated in the Tree of Life involving more and more new mechanisms I suppose one might imagine that at some point biological organisms would reach a maximum Fitness and go no further but even in our simple model with Fitness measured in terms of pattern lifetime there'll be no upper limit on Fitness given any particular lifetime it's a feature of the fundamental theory of computation that there'll always be a program that will will yield a larger lifetime so one might think at some point enough is enough the giraffe's neck is long enough Etc but if nothing else competition between organisms will always drive things forward yes a particular lineage of organisms achieved a certain Fitness but then another lineage can come along and get to that Fitness too forcing the first lineage to go even further so as not to lose out of course in our simple model we're not explicitly accounting for interactions with other organisms or orer detailed properties of the environment as well as countless other effects and no doubt there are many biological phenomena that depend on these effects but the key Point here is that even without explicitly accounting for any of these effects our simple model still seems to capture many core features of biological evolution biological evolution and indeed adaptive evolution in general is it seems fundamentally a computational phenomenon that robustly emerges quite independent of the details of systems in the past few years our physics project has given strong evidence that the foundations of physics are fundamentally computational with the core laws of physics arising as inevitable consequences of the way observers like us parse the ruad of all possible computational processes and what we've seen here now suggests that there's a remarkable commonality between the foundations of physics and biology both are anchored in computational reduci irreducibility and both sample slices of computational reducibility pH because that's what observers like us do to get descriptions of the world that fit in our finite Minds biology because that's what biological evolution does in order to achieve the course objectives set by natural selection the intuition of physics tends to be that there are ultimately simple models for things whereas in biology there's a certain sense that everything is always infinitely complicated with a new effect to consider at every term but presumably that's in large part because what we study in biology tends to quickly come face to face with computational irreducibility whereas in physics we've been able to find things to study that avoid this but now the commonality and Foundations between physics and biology suggests that there should also be in biology the kind of structure we have in physics complete with general laws that allow us to make useful broad statements and perhaps the simple model I'm talking about here can help lead us there and in the end help us build a new paradigm for thinking about biology in a fundamentally theoretical way I had some historical notes that I'll I'll go through as well so there's a long if circuitous history to the things I I've talked about here basic Notions of heredity particularly for humans were already widely recognized in Antiquity plant read breeding was practiced from the earliest of Agriculture but it wasn't until the late 1700s that any kind of systematic selective breeding of animals began to be common place then in 1859 Charles Darwin described the idea of natural selection whereby competition of organisms and their natural environment could act like artificial selection and he posited would over long periods of time lead to the development of new species he ended his big book Origin of Species with the following claim that from the war of nature the production of the higher animals directly follows and whilst this planet has gone cycling on according to the fixed law of gravity from so simple a beginning endless forms most beautiful and most wonderful have been and are being evolved so what he appears to have thought is that there would somehow follow from natural selection a general law like the law of gravity that would lead to the evolution of progressively more complex organisms culminating in what he called the higher animals but absent the kind of model I'm discussing here nothing in the later development of traditional evolutionary theory really successfully supported this or was able to give much analysis of it right around the same time as DW's Origin of Species Gregor mle began to identify simple probabilistic laws of inheritance and when his work was rediscovered at the beginning of the 1900s it was used to develop mathematical models of the frequencies of genetic traits and populations of organisms with key contributions to what became the field of population genetics being made in the 1920s and 1930s by JBS halan ra Fisher Su Wright and so on who came up with Su Wright came up with the the concept of a fitness landscape on a quite separate track there have been efforts ever since Antiquity to classify and understand the growth and form of biological organisms sometimes be an by analogy to physical or mathematical ideas and by the 1930s it seemed fairly clear that chemical Messengers Were Somehow involved in the control of growth processes but the mathematical methods used in population genetics basically only handled discrete traits or simple numerical ones accessible to biometry and didn't really have anything to say about something like the development of complexity in the forms of biological organisms the 1940s saw the introduction of what amounted to electrical engineering inspired approaches to biology often under the banner of cybernetics idealized neuron Nets were introduced by War mccullock and Walter pittz in 1943 and soon the idea emerged notably in the work of Donald Hebb in 1949 that learning in such systems could occur through a kind of adaptive Evolution process and by the time practical electronic Computing began to emerge in the 1950s there was a widespread belief that ideas from biology including Evolution would be useful as an inspiration for what could be done often what now would just be described as adaptive algorithms were couched in biological evolution terms and even when iterative methods were used for optimization say in industrial production or engineering design they were sometimes presented as being grounded in biological evolution Meanwhile by the 1960s there began to to be what amounted to Monte Carlo simulations of population genetic style evolutionary processes a particularly elaborate example was worked by NS belli on what he called numeric evolution in which a fairly complicated numerical cellular automat unlike competition between organisms program with Randomness injected from details of data layout in computer memory showed what he claimed to be biological evolution like phenomena such as symbiosis and parasitism in a different direction there was an attempt notably by John Von noyman to uh mathematize the foundations of biology leading by the late 1950s to What we' Now call two- dimensional cellular autometer engineered in comp licated ways to show phenomena like self- reproduction the followup to this was mostly early theoretical computer science work with no particular Connection by to biology and no serious mention of adaptive Evolution When the Game of Life was introduced in 1970 it was widely noted as doing lifelike things but essentially no scientific work was done in this direction by the 1970s though L systems and fractals had introduced the idea of recursive tree like or nested structures that could be generated by simple algorithms of rener rendered by computer graphics and seem to give forms close to some scen in biology my own work in onedimensional cellular autometer starting in 1981 focused on systematic scientific investigation of simple programs and what they do with the surprising conclusion that even very simple programs can produce highly complex Behavior but while I saw this is informing the generation of complexity and things like the growth of biological organisms I didn't at the time as I'll talk about a bit later end up seriously exploring any adaptive Evolution angles still another threat of development concerning applying biological like Evolution not just to parameters but to operations in programs was was was followed so for example in 1958 Richard Friedberg at IBM tried making random changes to instructions in machine code programs but didn't manage to get this to do much 20 years later super optimizers and practical compilers did begin to successfully use such techniques then in the 1960s John Holland who had at first studied learning in neural Nets and was then influenced by Arthur Burks art Burks who had worked on cellular autometer with bonman John Holland suggested representing what amounted to programs by simple strings of symbols that could readily be modified like genomic sequences the typical idea was to interpret the symbols as computational operations and to assign Fitness based on the outcome of these operations a genetic algorithm could then be set up by having a population of strings that was adaptively evolved through the 1970s and 1980s occasional practical successes were reported with this approach particularly in optimization and data classification was much being made of the importance of sexual reproduction inspired crossover operations something that began to be used in 1980s was also the much simpler approach of simulated and kneeling that involves randomly changing values rather than programs by the beginning of the 1980s the idea had also emerged of adaptively modifying the structure of mathematical expressions and of symbolic Expressions representing programs there were notable applications in computer graphics for example by Carl Sims as well as to things like the 1984 core war game involving competition between programs in a virtual machine in the 1990s John kosa was instrumental in developing the idea of genetic programming notably as a way to automatically create inventions for example in areas like circuit and antenna design and indeed to this day scattered applications of these methods continue to pop up particularly in geometrical and mechanical design from the very beginning there had been controversy around Darwin's ideas about Evolution first there was the issue of conflict with religious accounts of creation but there were also often vigorous disagreements within the scientific Community about the interpretation of the fossil record and about how large scale Evolution was really supposed to operate a notable isue still very active in the 1980s was the relationship between the freedom of evolution and the constraints imposed by the actual dynamics of growth in organisms and interactions between organisms and despite much insistence that the only reasonable scientific as opposed to religious point of view was that natural selection is all there is there were nagging Mysteries that suggested there must be other forces at work building on the possibilities of computer experimentation as well as things like my own work on cellular autometer they emerged in the mid 1980s particularly through the fets of Chris Langton a focus on investigating computational models of artificial life this resulted in all sorts of simulations of ecosystems and so on that did produce a variety of evolution related phenomena known from field biology but typically the models were far too complex in their structure for it to be possible to extract fundamental conclusions from them still there contined to be specific simpler experiments for example in 1986 his book The Blind watchmaker Richard Dawkins made pictures of what he called biomorphs produced by adaptively adjusting parameters for a simple tree growth algorithm based on the overall shapes generated in the 1980s stimulated by my work there were various isolated studies of rule evolution in cellular autometer as well as art and Museum exhibits based on this and the 1990s there was more systematic work notably by Jim Crutchfield and Melanie Mitchell on using genetic algorithms to try to evolve cellular autometer rules to solve tasks like density classification around this time evolutionary computation also began to emerge as a general term covering genetic algorithms and other usually biological biology inspired adaptive computational methods meanwhile accelerating in the 1990s there was great progress in understanding actual molecular mechanisms in bi ology and figuring out how genetic and developmental processes work but even as huge amounts of data accumulated enthusiasm for large scale theories of biology that might for example address the production of complexity in biological evolution seem to wne the discipline of systems biology attempted to develop specific usually mathematical models for biological systems but they never emerged much in the way of overarching theoretical principles except perhaps somewhat specifically in areas like neur immunology and neuros science one significant exception in terms of fundamental theory was Greg Chon's concept from around 2010 of metabiology an effort I'll talk about a little bit to use ideas from the theory of computation to understand very general features of the evolution of programs and relate them to biological evolution starting in the 1950s another thread of development sometimes viewed as a practical branch of artificial intelligence involved the idea of machine learning genetic algum were one of half a dozen common approaches another was based on artificial neural nets for decades machine learning languished as a somewhat esoteric field dominated by engineering solutions that would occasionally deliver specific application results but then in 2011 there was unexpectedly dramatic progress in using neural nets for image identification followed in subsequent years by successes in other areas and culminating in 2022 with the arrival of large language models and chat GPT what hadn't been anticipated was that the behavior of neuron Nets can change a lot if they're given sufficiently huge amounts of training but there isn't a good understanding of just why this is so and just how successful neuron Nets can be at what kinds of tasks ever since the 1940s it has been recognized that there are relations between biological evolution and learning in neuron Nets and having now seen the impressive things neuron Nets can do it seems worthwhile to look again at what happens in biological evolution and to try to understand why it works not least as a Prelude to understanding more about neuron Nets and machine learning while I also included some personal notes a little bit of personal history which people might find find fun so I have to say it's strange to say but most of what I've talked about here I should really have figured out 40 years ago and I almost did except that I didn't try quite the right experiments and didn't have the intuition to think it was worth trying more 40 years later I have new intuition particularly informed by experience with modern machine learning but even now what made possible what I've done here was a kind of chance experiment done for a somewhat different purpose back in 1981 I had become very interested in the question of how complexity arises in the natural world and I was trying to come up with models that might capture this meanwhile I just finished version one of s SMP the Forerunner to Mathematica in the wol language and I was wondering how one might generalize its pattern match Paradigm to General AI as it happened right around that time neuronet gained some temporary popularity and seeing them as potentially relevant to both my topics I started simulating them and trying to see what kind of general theory I could develop about them but I found them frustrating to work with there seemed to be too many parameters and details to get any clear conclusions and at a practical level I couldn't get them to do anything particularly interesting but I decided that for my science question I needed come up with something much simpler and as a kind of minimal merger of spin systems and neural networks I ended up inventing cell autometer only later did I discover that versions of them have invented several times before as soon as I started doing experiments on cellular autometer I discovered that they were a window into an amazing new scientific world that I've considered continued to explore in one way or another ever since my key methodology at least at first was just to enumerate the simplest possible cellular automon rules and see what they did the diversity and complexity of their behavior was remarkable but the Simplicity of the rules meant that the details of successive rules were usually fairly different and while there were common themes in their overall Behavior there didn't seem to be any particular structure to rule space occasionally though particularly in finding examples for exposition I would look at slightly more complicated and multicolored rules and I certainly anecdotally noticed that rules with nearby rule numbers often had definite similarities in their behavior it's so happened that around that time I started publishing uh about solid autometer in 1983 there were was a fair amount of ambient interest in theoretical biology and perhaps in part because of the cellular in cellular autometer I was often invited to theoretical biology conferences people would sometimes ask about adaptation in cellular autometer and I would usually just emphasize what individual cellular autometer could do without any any adaptation and what significance it might have for the development of organis but in 1985 I was going to a conference at Los Alamos on evolution games and learning and I decided I should take a look at the relationship of these topics to Cellular autometer but too quickly I segwayed away from investigating adaptive Evolution to uh seeing what kind of pattern matching and other operations I might be able to uh set cellular autometer explicitly up to do and as an example of uh the that I wrote at that time actually many aspects of this paper still seem quite modern in fact should probably be investigated more now but even though I absolutely had the tools to do it I simply failed at that time to explore what I've now explored back in 1984 I wrote a paper called 20 problems in the theory of cellular autometer and problem seven was how is different Behavior distributed in the space of cell automatan rules and over the years I'd occasionally think about cellular automatan rules space wondering for examp what kind of geometry it might have and uh particularly in the Continuum limit of infinitely large Rules by the latter half of the 1980s theoretical biologists had biology conferences had segue to artificial life ones and when I went to such conferences I was often frustrated people would show me simulations seemed to have far too many parameters to ever be able to conclude much people would also often claim that natural selection was a very simple Theory but as soon as it was implemented there'll be all kinds of issues and choices to be made about population sizes Fitness cut offs interactions between organisms and so on and the end result was usually a descent into some kind of very specific simulation without obvious robust implications you know in the in the mid 1980s I I put a fair amount of effort into developing both the content and the organization of a new direction in science that I called complex systems research my emphasis was on systems like cellular autometer that had definite simple rules but highly complex Behavior gradually though complexi started to be a popular buzzword and I suspect partly to distinguish themselves from my efforts some people started emphasizing they weren't just studying complex systems they were studying complex adaptive systems but all too often this seemed mostly to provide an excuse to dilute the clarity of what could be studied and I was sufficiently put off that I paid very little attention by the mid90s my book A New Kind of Science and I wanted to use biology as an example application of my methodology and discoveries in the computational universe in a section that I called fundamental issues in biology I argued much as I have here that computational IR reducibility is a fundamentally stronger Force than natural selection and that when we see complexity in biology it's most likely of computational origin rather than being sculpted by natural selection and as part of that discussion I included a picture of the often somewhat gradual changes in behavior that um show you this uh um that one sees with with successive one bit changes in a three-c color nearest neighbor Sol automatan Rule and yeah in the version of the book the book was not in color so you couldn't see the the three colors here but this wasn't done adaptively it was basically just looking along a random straight line in rule space and indeed both in this page and in most of the book I was concerned with what systems like cellular autometer naturally do not what they can be constructed or adaptably evolved to do I did give constructions of how cellular ENT can perform particular computational tasks like generating primes and somewhat obscurely in a section on intelligence in in the universe I explored finding recal inis neighbor rules that can s that can successfully double their imp put my reason for discussing these rules was to highlight the difficulty of saying whether one of these solar autometer was constructed for a purpose or just doing what it does that's the page of of doubling cellular autometer well many years went by there'd be an occasional project at our summer school about rule space and occasionally about adaptation I maintained an interest in foundational questions in biology gradually collecting information and sometimes giving talks about the subject meanwhile though I didn't particularly internalize the connection then by in the mid-2010s through our practical work in the W from language I'd gotten quite up to speed with modern machine learning around the same time I'd also heard from my friend Greg ton about his efforts as he put it to prove Darwin using the kind of computational ideas he'd applied in thinking about the foundations of mathematics then in 20120 came our physics project with its whole formalism around things like multi-way graphs it didn't take long to realize that yes what I was calling multic computation wasn't relevant just for fundamental physics it was something quite General that could be applied in many areas which in by 2021 I was trying to catalog I did some thinking about each of these the one I tackled most seriously first was meta mathematics about which I finished a book in 2022 late that year I was finishing a kind of 50-year in gestation project informed by a physics project on understanding the second l of thermodynamics and as part of this I made what I thought was some progress on thinking about the fundamental character of biological systems though not their adaptive Evolution and then chat gbt arrived and in addition to being involved with it technologically I started to think about the science of it and particularly about how it could work part of it seemed to have to do with unrecognized regularities in human language but part of it was a reflection of the emerging meta discovery that somehow if you bashed a machine learning system hard enough it seemed like it could manage to learn almost anything why did this work of course first I thought it must just be an obvious consequence of high dimensionality but I soon realized there was more to it and as part of trying to understand the boundaries of what's possible I ended up a couple months ago writing a uh little piece exploring can AI solve science so I talked about different potential objectives for science making predictions generating narrative explanations and so on and deep inside the piece I had a section entitled exploring spaces of systems in which I talked about science problems of the form can one find a system that does X and asked whether systems like neuron Nets could somehow let One Jump Ahead in what would otherwise be huge exhaustive searches well as a sideshow to this I thought it might be interesting to compare with what a nonneural net adaptive Evolution process could do remembering Greg chason's idea about connecting the whole ing problem to biological evolution I wondered if perhaps one could just adaptively evolve cellular automatan rules to find ones that generated a pattern with a particular finite lifetime I imagined it as a classic machine learning problem with a loss Function One needed to minimize and so it was that just after 1: a a.m. on February 22nd this year I wrote three lines of or language code and tried an experiment and it worked and I managed to find cellular autometer rules that would generate patterns in this particular case living exactly 50 steps well in retrospect I was slightly lucky first that this ended up being such a simple experiment to try at least in W language that I did it even though I didn't really expect it to work and second that for my very first experiment I picked parameters that happen to immediately work four colors lifetime 50 and so on but yes I could in principle have done the same experiment 40 years ago though without the wol from language it wouldn't have been so easy still with the computers I had back then they were powerful enough that I could in principle have generated the same results then as now but without my modern experience of machine learning I don't think I would have tried and I certainly would have given up too easily and yes it's a bit humbling to realize that I've gone so many years assuming adaptive EV Evolution was out of the reach of simple clean experiments but it's satisfying now to be able to check off another mystery I've long wondered about and to think that much more about the foundations of biology and machine learning might finally be within reach and that's the end of my uh piece that I posted uh a week or so ago um all right I think I have time for some questions here so let me see what people would like to talk about uh Lewis asks would it be worth investigating this where Fitness is expressed in terms of entropy rather than lifetime some sort of proxy of entropy such as the compressed size the size of a compressed representation of the evolution uh what if we were to measure the entropy of a sliding window over the evolution does the derivative tell us on thing yeah that's an interesting idea um one could try looking at just look at the pattern one generates by running the cular automatan rule and ask what its maximum compressed size is using some particular compression measure like some kind of you know zip compression or something like that or for extra points the amount of compression that a neural net could give it yeah it's an interesting idea it's going to end up being kind of another level above lifetime because if you say let us a pattern run for as long as it can run for it'll often run for an infinite time I suppose one could do this this particular experiment by saying run for 100 steps take the pattern you get then see what the compressed pattern is and use that as essentially a loss function or a fitness function to see how one would evolve it's a good idea should try it by the way I might say that in the piece that I wrote every picture you can click it and you'll get the WL language code to run that picture so you can make your own versions of this let's see lcq asks to what extent does the organism modifying its environment as strategy to optimize its own Fitness and vice versa IE complex mut reion limit the usefulness of organism focused CA yeah I mean I wondered about that I'd always assumed that the details of the environment and the whole structure of you know what is the structure of the environment how do different organisms interact with the organism on looking at and so on that all of that will be crucial to being able to see what the essence of of adaptive Evolution actually is to my surprise that doesn't seem to be necessary there's sort of a core phenomenon that has to do with kind of computational irreducibility meeting sort of the coarseness of adaptive constraints and so on that seems to be something one can investigate without all of that extra detail of organisms interacting with others and so on and and by the way I mean by looking at these multi-way graphs and such like one is seeing a window into sort of the all possible organisms of a population and what they do let's see uh tour asks do you know any objective way to determine whether a given feature is a result of drift versus adaptation or is there always some interpretation involved it's a good question um it's a good question I mean whether whether some particular feature some particular bit is the result of kind of neutral Fitness neutral drift or whether it is something that was sort of put there for a purpose to achieve an objective it's a good and fundamental question I mean certainly it's a good thing to do to look at the particular examples I have where you can ask does this particular structure uh arise does this particular structure you know arising and disappearing as you neutrally drift or does this structure only arise when you go through sort of forward Evolution to achieve a better Fitness so to speak that's a that's a good thing to look at and in fact one could one can imagine looking at that in the specific examples that I have where we we can just ask let's see where's a good example uh we could we could just ask for some of these for some of these rules I don't know let me put up a rule here and um we can we could ask whether um the particular kind of triangular areas you know do they arise do they sort of arise and disappear this doesn't show the neutral steps but do they arise and disappear in the neutral steps so that they sort of don't matter and you could have them or not or are they only are they things that sort of have to arise to make forward progress I think that's the kind of thing one could investigate but what's nice is that one has explicit examples here one might can explore things like that good another good question there ker asks how about uh introducing topology to challenge the patterns of the cell autometer so the different height requires different autometers so that competition can start on a 3D domain uh well gosh I mean yes one could look at different rules kind of laid out on a grid for example in fact uh GRE Chim was just suggesting that to me and uh it's a good experiment to try to look at uh sort of local competition between rules in addition to just looking at a rule trying to achieve higher Fitness so you could say of these two nearby rules on this grid uh which one achieves higher Fitness I'm not sure how different the results are going to be from what we've already seen because it's really a question of are you exploring purely in rule space or are you also are you constrained just by steps in rule space or are you also constrained by steps in physical space but it's a good thing to look at uh lcq task to what extent can algorithmic information Theory be used to characterize and as a metric for the optimality of biological evolution processes over these possible spaces and paths well I don't know now um you know what we've got here is rules with a certain number of bits that matter and in a sense the number of bits that matter you can think of as being sort of the algorithmic information content of these rules and what we're seeing is that with a very small number of bits that matter we can achieve lots of Fitness so to speak now you know Greg chaon has an approach that's based on machines with oracles and and much more sort of hyper computational thinking um I think it's going to be a story of of our long running discussion about whether the universe is like Omega his non-computable constant that represents the um uh the halting probability of a universal touring machine just had its 50th Anniversary actually or whether the universe is like Pi something explicitly computable I think this is going to be the same kind of story but I haven't quite unraveled how that works um let's see Elsie asks what about the complex composite Fitness of both organisms population and the environment where they mutually co-evolve um uh both defining and assessing what's going on yes there are an infinite number of ways that one can make these models more complicated the the achievement that I was sort of excited about here was we got something that really seems to capture some core features of of biological evolution with a model that we can kind of wrap our arms around as soon as we start talking about co-evolution and things like this we're dealing with something where we're going to have boatloads of parameters we're going to have much more complicated models so I say I mean I have been saying this about biological evolution in general so it's probably worth the thought to see whether there are some similarly simple models that involve sort of multiorganism uh competition and so on but I I don't know that and and so far the you know the surprise is that just one organism L one lineage with pure mutation is sufficient to show some pretty interesting and core phenomena I think um ah is is there um let's see RS s there's a strict mathematical reason for using 26 Dimensions oh gosh it sounds like string theory doesn't it um or can any power of two number of Dimensions be used the reason that number shows up is just because there are three to the three possible cases in a three-c color rule with nearest Neighbors which is 27 cases we're cutting off one of those cases because we want all white to trans to turn to all white so we keep the sort of background blank and then we are and that leaves us with 26 cases in the rule and then every every mutation involves taking one of the three colors that you already have and going to one of the other two colors so it's two times uh um let's see it's 2 * 26 alog together which is 52 um uh possible directions but the oh I see 20 yeah that's right it's 26 Dimensions because there are 26 changes you can make to the rule they're discrete dimensions and each one has three possible values so there are two possible uh directions you can go away from the one you're already at so to speak two possible positions you can go away from the one you're already at but it's it's this is just because of the particular cellular automatan rules I I picked I mean with larger rules you get much larger dimensional spaces and so on uh let's see yes commenting on on the it's it's nice to see one can compare the efficiency of biological evolution comparing with exhaustive search yes it is interesting I was really excited about it I am really excited about it there's a lot more to study but I I sort of thought I would uh uh would put out what I had figured out so far um LC comments that um uh a uh one of my presentations on a given topic will converge to a discussion to computational irreducibility is itself quite computationally reducible I do tend to agree um the uh somebody one of my kids showed me some some posts from somebody that uh noted that in a in a thing that I wrote about the second about about uh the three body problem and Eclipse prediction that there was a discussion of computational irreducibility there which is highly relevant to the three Ry problem but they described it as a kind of principle of of narrative equivalence analogous to my principle of computational equivalence that in the end anything I talk about ends up talking about computational irreducibility um I I think the reason for that I would argue is because it's a really important phenomenon and it's it's sort of in a sense it's the banner phenomenon of the computational way of thinking about the world and it's the uh and so it's it's really something that's going to show up whenever you have a computational way of thinking about the world um I'll see comments that I might find it interesting that um my work on W language design where the Streed or not can itself be the object of evolutionary analysis yes indeed a good thing to study uh I have studied a little and I have all the data to study kind of the evolution under well I don't know if it's natural selection or human imposed artificial selection of the the features of Oram language it it's a would be a good thing to study at some point um uh Natan asks how have computational models influenced our understanding of evolutionary Dynamics and species interactions I mean there's a huge amount of literature on this subject and there are lots of models that use for example two-dimensional cellular autometer to discuss I mean plants are easier than animals because they don't move around they just sit there and so you can say you know a plant interacts with its neighbors and so on and there are lots of cool results about sort of the banding patterns in forests and things that you get from simple cellular automatan rules um there's there's a lot lot of results of that kind I would say that my sort of big picture view of that is a lot of results about fairly simple properties of what happens because one hasn't really had a good way to talk about sort of the more complicated things that happen what I'm showing here and I I don't think it's it's a beginning not an end is here are sort of complicated things that can happen which you can see are visually complicated just from these very simple rules that are sort of driven by adaptive Evolution so I think I mean there there's as I say there's a huge amount of literature on computational models people have used a bunch of models I've I've developed with cellular autometer people have used War language a lot um sort of I've been quite exposed to these things uh but they're not really quite addressing these foundational questions in biology they're more addressing questions that are more in the we see this particular thing in this particular class of systems and so on um let's see Natan asks are there Universal properties that all computationally irreducible systems share mostly that they are computationally irreducible I'm afraid that there are many many consequences of computational ir reducibility in terms of apparent Randomness lack of compressibility uh kind of always generating surprises there are all sorts of kind of implications it's it's kind of like and and closely related to things like the law of entropy increase you know the fact that things tend to get to more random higher entropy then has many many implications so does the presence of computational irreducibility um 420 asks what insights of simulations of artificial life provided in understanding the fundamental processes of of Natural Evolution um well you know artificial life is sort of a great initiative I would say that my own feeling is that people have tried to be too realistic in most of the simulations that have been done they've tended to try and say hey let's simulate a whole rainforest let's get you know 100 different species and they're more than that in a rainforest but you know and let's study the the way that this is a parasite of that and the way this and so on and so on and so on and they have quite realistic models in a sense which Yes actually do seem to reproduce what is seen in practice but in a sense that's not too surprising because you really put in the effects that are really happening in practice and it's the interaction of those effects that give the things you see so it's it's it's something that um uh I would say in terms of these foundational questions about sort of why does biological evolution work why is their complexity in biological organisms and so on I think that's not been the focus of a lot of what's been done in what's generally referred to as artificial life it's been more exploring the sort of more realistic examples now that's probably not generally true I mean there are there are endless things that have been written and and a bunch of them particularly in the earlier years were very much based on cellular autometer and on things that I'd explored and so on in cellular autometer um but again with a somewhat different emphasis I mean there were a lot of nice results like Chris Langton had a nice result that sort of class 4 Behavior was in between Class 2 and class three if you looked at various different sequences of of uh uh of of of rules and so on okay let's see last question here um obss I'm curious whether there's more irreducibility or more reducibility in these rules how can we measure the density of irreducibility or reducibility I think any of these kind of compression measures of you know to what extent can this particular procedure predict what's going to happen is a way to sort of explore reducibility and it's a good suggestion that you guys are making should be investigated you know we have our annual summer school and summer research program for high school students coming up in a few weeks I guess um uh that's great fod for some projects uh for for students there um and I can't help but to advertise those those programs because there always a lot of I think for everybody involved certainly for me I learn about I kind of invent lots and lots of projects for people which people seem to have a lot of fun with and I learn a lot from from those people and from the projects that they do all right well we should wrap up here um and uh thanks for coming and um uh I hope uh you find my efforts on biological evolution which I wasn't expecting to write about at this time but happened to kind of chance upon some interesting directions I hope you found them interesting um I'm hoping to apply these ideas to thinking about the foundations of machine learning um and hopefully I'll have something to say about that in due course also some of what I've talked about here is relevant to questions about sort of why observers like us exist I've talked about how observers like us inevitably observe certain laws of physics but there are questions about why observers like us exist and I think that some of this sort of work on adaptive Evolution and so on might shed some light on that and about why there is this patch of Ral space that is us and all the things we know about and how that manages to arise and manage to stay coherent and perhaps the things that I've done here will shed some light on that so anyway thanks so much for joining me and bye for now