My name is Jack Szostak, I'm a Professor of Genetics at Harvard Medical School, an Investigator at Massachusetts General Hospital, where my lab is, and I am an Investigator of the Howard Hughes Medical Institute. And in this part of my lecture, I'd like to concentrate on the one aspect of the origin of cellular life, which is the chemistry of copying replicating nucleic acid templates. So let's begin by looking at our schematic version of how we're thinking about a simple protocell, so again we have a two-component system: A primitive cell membrane encapsulating some genetic molecules, could be RNA, could be DNA, could be something related. In the previous lecture, we dealt with the growth and division of the membrane compartment, and this time we want to focus on the copying of templates to make a duplex product: The separation of strands, and the copying of those in a subsequent round so that you can distribute that information to daughter cells. And this is an essential aspect of the emergence of Darwinian evolution. There's some kind of informational polymer to code for heritable functions, could be anything that's useful for the cell, something that helps it to grow, something that helps it to divide, something to helps it to survive better in its environment, almost anything. But we need to have a way for that function to be coded and transmitted from generation to generation. Now, there are two general ways of thinking about the process of nucleic acid replication: The first would be some kind of enzymatic or catalytic process, so the classical example of that would be an RNA replicase, an RNA molecule that's an RNA polymerase that's good enough to copy its own sequence. I worked on that for a number of years. Many people have followed up on that work, and there's been a lot of progress made. I do think that eventually we will have molecules that can do that, but at the present time, we're still far from such a solution. And that has driven us to rethink the process and to step back and look at chemical processes that might lead to the replication of nucleic acid templates, whether RNA or some related molecule. So that's what the main focus will be on: Chemical processes leading to efficient copying and replication of templates. Now, there are two really critical factors that apply in any such system, and these are that both the rate of copying and the accuracy, the fidelity, of copying have to exceed critical thresholds. So the rate of replication has to be faster than the rate of degradation, and with a molecule such as RNA, that's a really important factor, because RNA is such a delicate polymer, hydrolyzes relatively rapidly. That imposes a lower limit on an acceptable rate of replication. In addition, the fidelity of that process has to exceed the Eigen error threshold. If we want to propagate useful information from generation to generation, the accuracy of that copying process has to exceed a threshold, which is basically related to the reciprocal of the number of important nucleotides for whatever function we're talking about. In practical terms, that means we probably need to think of accuracies or, shall we say, error rates below a few percent. Typical chemical processes that we study now, the error rates are in the range of 5% to 10% or 15%, so some improvement is required. And we would also like to see improvement in the rate of replication, so that we can do these experiments on a reasonable laboratory timescale. So, the building blocks that we're going to use for these experiments are quite different from the nucleoside triphosphates that are used in all modern cells. So these are modern substrates. They are kinetically trapped in a high-energy state so that they require very sophisticated catalysts, enzymes that can confer the 10^12 rate acceleration that you need in order to make effective use of this kind of substrate. They are also, as I've mentioned before, highly charged because of the triphosphate group, which keeps them from leaking out of cells, which is a good thing for modern cells but a bad thing for primitive cells, which will require their substrates to come in from the environment, get across the membrane spontaneously. And that means we would like to think about less polar substrates. And so that drives us to think of molecules similar to that that you can see down here. These are nucleoside phosphorimidazolides. These kinds of molecules were first synthesized by Leslie Orgel and his students and colleagues, and studied in quite a lot of depth in the 1970s and 80s and 90s. These molecules are much more chemically reactive, we have a much better leaving group, so they're intrinsically more reactive. They can spontaneously polymerize and copy templates without enzymes, and they're also less polar, which means they can get across membranes much more rapidly, without any transport machinery being invoked. So, the early work by Orgel and his colleagues got us partway to copying systems, but there were a series of problems that they ran into, which we'll come to and try to consider individually. But first, I want to step back a bit and think about how these kinds of molecules would've been generated on the early Earth. And that's really a pretty major problem. It's still a major research area, but I think very exciting progress has been made recently. So in the early days, there were self-assembly processes that were discovered that seemed to suggest that the solution might be really easy. In fact, the classical formose reaction, going back to Butlerov, showed that you could make sugars by polymerizing formaldehyde. And so for example, ribose can be viewed as an oligomer of formaldehyde. Five formaldehyde molecules can self-assemble in a series of steps to give you ribose. The problem is making just ribose. In fact, in this kind of chemistry, typically you'll end up with dozens or even hundreds of products. So, part of the problem that's absorbed people has been how to make just the right sugar. A similar problem comes from thinking about the nucleobases. So early on, Juan Oro did very dramatic experiments showing that he could actually made adenine from cyanide. Simply boiling a solution of cyanide gave you some adenine, along with a lot of other products. But it's very striking that adenine can be viewed as a pentamer of hydrogen cyanide. So again we have the same problem of how do we get just the building blocks we want (the A, G, C, and U), as opposed to all of the other related heterocycles that come out of this kind of chemistry. Now, when people like Orgel and many of his colleagues started to look at the synthesis of pyrimidines, again it looks, superficially, very easy. For example, a cytosine can be viewed as the product of reaction of cyanoacetylene and urea. And in fact there are variations on this chemistry that are extremely efficient. So it was starting to look like you could make sugars, you could make nucleobases, maybe it would actually turn out to be easy to make nucleosides and nucleotides and get us all the way to RNA. But it turned out that there was a major problem, even apart from the problem of making just the molecules we want. If you could have ribose and, say, cytosine, you would need to join them together by making the glycosidic bond that links them, and that chemistry basically just doesn't work, no matter how hard people tried, this was a roadblock for a long time. So one of the most exciting advances in prebiotic chemistry in recent years has come from the laboratory of John Sutherland in the UK, who basically followed up on much earlier work of several other labs, and showed that there's an alternative pathway that can get you to the final product without ever having to make this particular glycosidic linkage by joining together a base and a sugar. And the solution basically comes by making this intermediate, 2-aminooxazole, from cyanamide and glycolaldehyde. In a series of very simple and actually remarkably efficient steps, this intermediate can be elaborated into cytosine, and then deamination can give you U. So it looks like there might be a reasonable pathway to at least getting to the pyrimidine nucleosides. The synthesis of the purine nucleosides by an analogous pathway is a topic of active research. And if it turns out that there is a similarly efficient pathway, that will certainly be very satisfying in the sense of providing very efficient and regiospecific chemistry that gives us a restricted set of building blocks leading up to RNA. Now, there's still many gaps in our understanding of how we would make pure, concentrated starting materials. There are some steps leading up to activated nucleotides that are far from clear. So there's a lot of work to be done, but I think this new chemistry has really advanced the field a lot. So, let's skip those missing link steps for the time being, and assume that we can make activated nucleotides. The next problem we have to think about: Is there polymerization into RNA chains, and how could that happen? Well, here, we're in a good situation, because we actually have two solutions to the problem. The first is the finding of Jim Ferris and colleagues, including Leslie Orgel, and this is a finding that a common clay mineral, montmorillonite, illustrated here, is a really effective catalyst for that kind of polymerization. So this clay mineral is a layered hydroxide, aluminosilicate, and in between the layers there are water molecules. And organic molecules also tend to accumulate in the inner layers of the clay mineral, and as they accumulate there and become concentrated and oriented relative to each other, their polymerization is catalyzed. And the nice thing is that this is not the only way of doing it. You can get essentially the same result simply by taking a solution of these activated nucleotides, these phosphorimidazolides. And as a dilute solution at room temperature, almost nothing happens, they don't polymerize. But if you put that solution in the freezer and allow the water to freeze and generate ice crystals, what you see is that the solutes, including these nucleotides, get concentrated in thin layers in between the ice crystals, and when molecules are concentrated that much, even at low temperature, they can start to react with each other, and you will see the spontaneous formation of long RNA chains as a result of freezing. So this is very nice, because now we have two plausible, natural scenarios where either a common mineral or just the process of freezing could generate RNA chains. So the next problem we have to deal with, assuming we can make sets of essentially random RNA polymers, how could they be copied? And so here is where we run into a fresh set of difficulties. So the partial copying of RNA templates has been known for decades, as I said, from the work of Leslie Orgel and his colleagues. This is an example of that kind of chemistry done by David Horning, when he was an undergrad in my lab. We start off with a substrate here, a guanosine nucleoside, activated as a phosphorimidazolide, and we supply that to a primer-template complex in which the template contains a region of Cs where the G nucleoside can bind and result in primer extension. So here would be the starting material, and then over time we want to see the incorporation of Gs to elongate the primer. And that process is shown down here in timecourse, where we start off with just the primer, and then over the first, say, six hours, we observe the incorporation of the first nucleotide, and then over the next day or two, we see the second nucleotide come in, but even after two days, there's very little of the third nucleotide. So that illustrates the first problem: This process is intrinsically rather slow. And because this chemistry requires a very high concentration of magnesium to catalyze the reaction, this rate of synthesis is actually on the same timescale as the rate of degradation of the RNA template. So that's a problem. There are other problems, you can see, in the structure of the ribonucleotide, that there are two hydroxyls on the sugar, and either of them can react to generate either the correct 3'-5' linkage or the incorrect 2'-5' linkage. And so that means you inevitably get a mixture of linkages, some of which are the natural, correct linkage found in RNA, and others which are not. The next slide really summarizes the numerous problems or challenges that must be solved if we're ever to think of a complete chemical process for the replication of RNA. So, we begin with these problems of rate and fidelity. The fidelity is actually closely related to the problem of rate. It turns out that, when you make a mistake in incorporating a nucleotide, so as a chain is growing, you put in the wrong base, make a mismatch, the addition of the next base can be dramatically slowed. We call this the stalling effect, and therefore it slows down the overall rate of synthesis. If we could make the chemistry more accurate, the rate of synthesis would be much better. There's the regiospecificity problem that I mentioned, 2' versus 3' linkages. There is a problem that we need very high concentrations of these monomers. They apparently need to be very pure, so if you have other kinds of nucleotides in there, say, with different sugars or different stereochemistry, they will also get incorporated, and that will mess up the product that's made. Also, these monomers, when they're activated as imidazolides, are rather unstable, they're quite susceptible to hydrolysis or cyclization, so those are undesired side reactions. They could be solved if we had the right kind of chemistry to reintroduce the activated state, but that's something that's missing so far. The requirement for a very high concentration of magnesium is extremely problematic both because it's geochemically unrealistic and because it leads to RNA hydrolysis. There's another problem with RNA, which is that, even if you could replicate a strand of any significant length, the melting temperature of that duplex is so high that it's almost impossible to pull the strands apart. Thermally, it becomes impossible to melt them. Even if you could melt them, the strands will come back together again extremely rapidly. This rapid reannealing rate is something that will compete with the much slower template copying chemistry, so this is another problem that has to be solved. And finally, in our experiments, we use primers and watch them grow, simply because that's analytically very easy to do, but of course there weren't primers around on the early Earth, and so we need to think of a primer-independent process for copying templates. So all of these problems together have made it very difficult to think of a plausible pathway for the overall replication of RNA templates. So, what we decided to do was essentially to step back from this and think about other polymers, maybe it would be easier to replicate something else. And in the process of figuring that out, maybe we would get clues that would let us come back to RNA and think about how to solve some of these problems. Now in fact, Leslie Orgel concluded a couple of decades ago that, even though RNA replication looked really hard, he thought that the replication of some kind of informational polymer would be achieved fairly readily, and that in the process of doing that, we would learn something about either RNA replication or how to replicate something that might be relevant to the origin of life. Now, unfortunately, despite that challenge to the chemistry community, few people have addressed the problem, and there is still no example of the chemical replication of any informational polymer. So I think this is a major challenge. It's a really interesting and fun thing to investigate. And this is really the focus of a lot of our attention at the moment. So what can we look at? What would other interesting nucleic acids be? So, what I'm going to do is just show you a set of the kinds of molecules that we've been studying in my lab over the last couple of years. And we're concentrating on phosphoramidate-linked genetic polymers, so these have nitrogen-phosphorus bonds in place of the oxygen-phosphorus bond you see in normal phosphodiesters. The reason for that is that the building blocks, the monomers, for making these polymers are aminosugars, so we now have a much better nucleophile than a hydroxyl group, so that speeds up the chemistry again, giving us another boost in rate. So, here are three phosphoramidate backbones. Here's an acyclic, open-chain backbone with essentially a glycerol nucleic acid backbone. Here is a 2'-5'-linked polymer, so the phosphoramidate version of DNA, with 2' linkages. And here's the molecule closest to DNA. All that's been done is to change to normal oxygen atom here to an NH group, so this is phosphoramidate DNA. And then there are two other molecules that have captured our attention recently, and these are somewhat more conformationally constrained molecules. So this is the phosphoramidate version of TNA, threose nucleic acid; here, the sugar is a four-carbon sugar, threose. And here we have a 2' linkage. These molecules were first made in the Eschenmoser Group and studied. They're perfectly good base-pairing systems. And finally, here you see a morpholino backbone, another conformationally constrained backbone. So in the case of the threose, the conformational constraint comes from the fact that there are only five atoms in the backbone repeat unit, so there's one less rotatable bond, so it's entropically constrained. Here, the constraint is different. It comes from the fact that we have the six-membered morpholino ring, which likes to sit in this chair conformation, so it's conformationally constrained in a very different manner. So, what we've been trying to do is systematically study all of these different kinds of templates and see if we can learn anything from this process that might eventually feed back and teach us about RNA replication. So, at this stage, we're really still heavily involved in studying the copying of these templates. In many cases, these are actually quite challenging to prepare synthetically, so that takes a lot of time and effort. But I'll take you through what we've done so far. So we began by studying the simplest template from a structural point of view, so this is the glycerol nucleic acid backbone. So no cyclic sugar, just an open-chain backbone. And here's the corresponding monomer. So we have the amino nucleophile, we have the activated phosphate, these look very simple from a structural standpoint, but in fact, there's a major problem: That lack of constraint from the cyclic sugar means that the amine nucleophile can directly reach the phosphorus electrophile, and as a result, the activated monomer cyclizes to give this useless product here faster than we can measure. So that tells you right away that this system is not chemically a good system to look at in terms of replication. But, the speed of this reaction actually told us something kind of interesting, which is that the intrinsic chemistry, it can be very fast. So, if you can position the nucleophile in just the right position and orientation relative to the electrophile, polymerization could in principle go very rapidly, even without an external catalyst. The system that we've actually spent the most time studying so far and learned the most about, is the 2'-5'-linked phosphoramidate version of DNA, so here's the polymer. A series a of 2'-5' phosphoramidate linkages, and here's the corresponding monomer, shown as the G nucleoside version, but amine nucleophile. Phosphorimidazolide, so good leaving group, good nucleophile... we should get very fast polymerization chemistry. And in fact, that's exactly what we observed. In our first experiments, after we learned how to make these molecules, we set up the following system, where we have a primer-template complex. The template contains this region of Cs, where the G monomers can bind, and we can then observe the primer being elongated by the sequential incorporation of multiple G residues. The result experimentally is shown here. We start off with the primer, and over the course of hours, we can see the complete copying of the template and the accumulation of the full-length, extended primer. So this reaction is so efficient that, I think, if you didn't know this was just chemistry, you would think this an enzymatically catalyzed reaction, but there's no enzyme, there's no polymerase, this is just the intrinsic chemistry of activated nucleotides binding to a template and extending a primer. So, if we could do this in a more general way, so that we could copy templates of arbitrary sequence, we would basically have the kind of system that we want. We would be able to copy sequences that could carry out functional tasks. One of the nice things about this overall system is that, because of the 2'-5' linkages, the duplex that's formed has a relatively low melting temperature, it's relatively easy to thermally separate the strands, and we could imagine a cycle of complete steps of copying and full replication. So, unfortunately, things are not so simple. This copying chemistry works very well with C templates driving G incorporation, works very well with G templates driving C incorporation, but when we went to try to copy templates that contain As and Us, it basically didn't work at all. So we assumed that the problem was that the AU base pair is much weaker than the GC base pair, which is true. And so the solution was simply to go back to the chemical drawing board and look at different nucleobases that make an AU-like base pair, but that is just as strong as a GC base pair. So it turns out, in fact, this has already been done in other contexts. So, people like Chris Switzer, for example, have looked at the base pair made between D, which is short for diaminopurine, and propynyl-U, which you see over here. So we have a propynyl group at the 5' position of U, this contributes extra stacking energy. The extra amino group in diaminopurine gives us back the third hydrogen bond, and this base pair in the context of DNA is essentially just as strong as a GC base pair. So, we made the corresponding activated monomers, and sure enough, it solves the rate problem. We can now, using this activated propynyl-U nucleotide, we can copy a template consisting of four D residues, and in fact, it's very fast. The reaction's finished within the first ten minutes. We can copy using the activated D monomer, a template consisting of propynyl-Us, it's a little bit slower, but still mostly done within an hour. Not too bad! So we thought, okay, maybe we've really solved the problem. Let's get a little bit more ambitious and try to copy progressively longer template sequences that include progressively more, different nucleotides in the sequence. So, the first step looked pretty good. Here we have a template that consists of three Ds and three Cs, so we're incorporating three propynyl-Us followed by three Gs, and the reaction goes pretty well, within a few hours. If you leave out the G, you stall where you should, if you leave out the U, you basically stall almost at the beginning. So that looks encouraging. You do see a few shorter products here, which made us worry a little bit about the accuracy of the overall process, but it's not too bad. As we go to longer templates, so here a mix of Gs and Cs, we can still copy the whole thing, but it does take longer, and you do see more of these intermediate sequence accumulating. And that actually gets much worse, when we go to an even longer sequence of 15 or 16 nucleotides incorporating all four bases in the template. And now we still get some full-length product, but it takes a long time to accumulate, and we see a lot of these stalled intermediate products accumulating over the course of the reaction. So we don't know for sure, but we suspect that these stalled intermediates are the result of mistakes in the template-copying process, such that a chain is growing, a mismatch is formed by a mistaken incorporation, and that drastically slows down the subsequent polymerization. So in fact, we think that, in order to get more efficient copying, in order to speed up the overall reaction, we need to solve the fidelity problem, and that that might help solve the rate problem. So how could we do that? Well, we could look at different nucleobases, maybe our choice of D and propynyl-U was not so great. In fact, there are chemical reasons to think that that's true. In particular, the propynyl group on U changes the pKa of its N1, which can lead to the formation of other tautomers and mismatched base pairs. We could look at other backbones, which we are doing. So there's the possibility that, if the backbone is conformationally constrained in just the right way, it will favor the incorporation of the right bases and disfavor the incorporation of mistaken bases. We could also consider looking at oligonucleotide substrates, which actually turns out to be a really good idea. There are probably a lot of reasons why this would be helpful. And we could also consider looking at catalysis, either ribozyme-mediated catalysis or perhaps catalysis by small molecules or short peptides that might've been lying around, and that's another approach that we're starting to take. So, let's go back to this idea of looking at different nucleobases. So, it turns out there's actually a very simple substitution that looks extremely promising at this stage. So here's the D-propynyl-U base pair which we think is causing problems with fidelity. An alternative is to just replace U with 2-thio-U, so U with a sulfur in place of the oxygen normally at the 2' position. This is an analogue of U that's actually found in nature, in modern biology, it's a common substituent in tRNAs, where it plays the role of stabilizing an AU base pair and increasing the fidelity of that interaction. And the reason that works is because the much larger and polarizable sulfur contributes to stacking interactions. The larger size is accommodated in a base pair with A, but is not accommodated in a wobble base pair with U. So it seems to both stabilize the AU base pair and disfavor the incorrect wobble base pairing. So, we have preliminary experiments that suggest that this is a promising approach, and we're continuing to look at that. Meanwhile, we're also looking at a number of these other backbones, and so here we come back to the 3'-linked phosphoramidate version of DNA. Here is the corresponding monomer. We like this system for different reasons than we like the 2' system. Here we're making the natural 3'-5' linkage, so we're a little bit closer to making an RNA-like product. In fact, duplexes of this 3' phosphoramidate version of DNA have a very RNA-like geometry, so that's kind of nice. These monomers can cyclize, so this amino group can reach the phosphorus, but that reaction is fairly slow, so it's not a fatal problem. What may, in the long run, be more of an issue, is that those duplexes have a very high melting temperature. Nonetheless, it's an interesting system to study because this chemistry seems to go very effectively, and in fact, we see efficient incorporation of all four natural nucleotides using this backbone. So it shows that there's a very important coupling between backbone chemistry and the bases that are involved in forming the Watson-Crick paired structure of the duplex. We don't really fully understand the nature of that coupling. I mentioned before two conformationally constrained phosphoramidate-linked nucleic acids, the threose nucleic acid (TNA) and the morpholino backbone, which we abbreviate as MoNA. These are very interesting systems, and the hope here is that conformational constraint might be a way of increasing both the accuracy and the rate of chemical copying. So why do we really think that? Here's an experiment done by Jason Schrum when he was a graduate student in the lab, and we're here looking at the incorporation of 2' amino nucleosides, extending primers where the template is composed of a series of different polymers. So DNA over here, RNA template, an LNA template (LNA stands for "locked nucleic acid," this is a relative of RNA where the sugar conformation is locked into the RNA-like conformation by a cross-link underneath the sugar), and over here is a 2'-5'-linked DNA template. So what you see in the timecourse of this primer extension reaction, is that the reaction goes fairly slowly on a DNA template, a lot more rapidly on an RNA template, but even more rapidly on the conformationally constrained LNA template. So this was our first real experimental hint that conformational constraint of a template could really have a useful and significant effect on the rate of polymerization. And so that's encouraged us to go ahead and make these conformationally constrained templates, even though the synthetic chemistry is rather challenging. So here is the threose nucleic acid backbone, so again a four-carbon sugar. Here's the corresponding monomer. There has been some structural work done from the Egli Lab. You can see here that a TNA duplex looks, at a gross level, very similar in overall geometry to an RNA duplex. So this, I think, is encouraging for the possibility that this constrained backbone might help to position the incoming nucleoside correctly, so as to speed up polymerization and potentially make it more accurate. So we hope to do those experiments over the next few years. Here is the morpholino phosphoramidate backbone, again conformationally constrained in a very different way because of the six-membered morpholino ring. These molecules are actually much easier to make than the TNA molecules. And so we've been able to start looking at the copying of morpholino templates by morpholino monomers, so we think this is another very promising system in which to investigate experimentally the effects of conformational constraint on the rate and fidelity of copying. So, even though we're still far from having a complete chemical system that could drive the replication of any nucleic acid or any genetic material, we can use what we've found so far to learn about the compatibility of the chemistry of genetic replication with our replicating vesicle systems. So we can do experiments where we encapsulate nucleic acids inside vesicles and look at the copying chemistry. So an example of that is shown here. This was work done by Sheref Mansy and Jason Schrum and other people in my lab, a few years ago. The basic experiment is to take the same kind of primer-template complex that you've seen before, and monitor the extension of the primer by a template-directed synthesis. But this time, the primer-template is inside one of these vesicles, so we're going to add the activated monomer to the outside. It has to get across the membrane spontaneously, without any transport machinery, to get to the inside, where it can do this template-copying chemistry. So, here's the experimental result: On this side, you see the control reaction done in solution, you see the accumulation of full-length material over 12 to 24 hours. Here is the same experiment with an encapsulated primer-template, and you can see that the reaction is slowed down a little bit, but still by 24 hours you can see the accumulation of mostly full-length product. So that was actually extremely encouraging for us. This was a real major advance, because it said that, yes, the chemistry is compatible; these building blocks can get across the membrane; when they get to the inside of the vesicle they can copy templates; and once we developed a more general template-copying chemistry, we should be able to combine it with the replicating vesicles and have the composite system that has been our goal all along. Now, in this experiment, the membrane was made from a convenient laboratory system, these unsaturated C14 fatty acids. We can do the same experiments with a much more prebiotically realistic mixture of fatty acids, fatty alcohols, their glycerol esters, saturated ten-carbon chains, and we see the same thing. Essentially over a period of 12 to 24 hours, the nucleotides can get across these membranes and copy templates. In contrast, when the vesicle is made from more modern molecules, from phospholipids, those are a complete barrier to the penetration of these nucleotides, so nothing happens. You see no primer extension. So what this is telling us is that, for this to work, for this protocell model to work, you need the membrane to be made of the right kinds of primitive molecules, so simple fatty acids and related molecules, and the nucleotides have to be the right kind of primitive molecules, not triphosphates, but something like phosphorimidazolides, something less polar. All right, so with all of this work, are we actually any closer to coming back to RNA with new ideas for complete replication? So, I think that we are, and I'll tell you about one of the way of the conceptual advances that we've made recently. This actually comes from a selection experiment that was done by Simon Trevino when he was a graduate student in the lab, and it addressed this question of monomer homogeneity: How important is it really in a prebiotic setting that the monomers be really pure and concentrated, so that we don't make backbones that have different kinds of linkages, different kinds of sugars, etc.? So the experiment that we could do in the lab is a little bit more limited, but what Simon worked out was a way of taking a library of DNA sequences and transcribing that into molecules that are not just RNA, but a mixture of RNA and DNA. In fact, in every position in these transcripts, there's roughly a 50-50 chance of that linkage being a ribonucleotide or a deoxyribonucleotide. So we have extreme backbone heterogeneity, ribo- and deoxyribonucleotide linkages, and that variation is not heritable. The experiment was to take this library and then select for functional molecules, we select for aptamers just by binding to a target molecule. The targets were used were ATP and GTP, because we'd done this many times years ago. It's easy to evolve RNA molecules or DNA molecules that specifically recognize these nucleotides, but in this experiment, the pool isn't pure RNA or DNA, it's this mixed backbone polymer. So what Simon found is that he could go around cycles of selection and amplification. Every time we do the amplification, we reintroduce this backbone heterogeneity, but of course the molecules get shuffled. The exact order of ribo- and deoxyribo- linkages is randomized at every round of the selection process. Nonetheless, after a few rounds of selection, Simon was able to obtain aptamers that bound to their target with great specificity. They weren't quite as good in terms of affinity as the aptamers we get from pure RNA or pure DNA, but they still work. So this told us that maybe monomer heterogeneity wasn't as important as we'd been thinking. Maybe you could actually evolve functional molecules, ribozymes, in the face of nonheritable backbone heterogeneity. So why is that important in the context of RNA? Well, one of the big problems with RNA is this regiospecificity, the fact that, in a chemical system, it seems almost unavoidable that some fraction of 2'-5' linkages will be formed. I think Simon's experiment hints that this may still allow for the evolution of ribozymes, so this is something that needs to be experimentally investigated, something we're doing now. Now, if that turns out to be true, and you can still get functional molecules in the face of this backbone heterogeneity, then the important implication comes from the fact that we already know the 2'-5' linkages in the backbone drastically lower the melting temperature of an RNA duplex. And so, as a result, it would now become possible to thermally separate the strands after the copying of an RNA template. So, it's possible that this 2' versus 3' heterogeneity that we used to think was such a huge problem with RNA, is actually what allowed RNA to work as the primordial genetic material, because it allows for thermal strand separation, and therefore, the repeated cycles of template copying and strand separation that give you overall replication. So this is the kind of thing that we're actively studying. A few more points about more primitive scenarios for template copying... All of the work that we've done, and many other people have done over the decades, has tended to focus on primer extension reaction with monomers, because this is a very simple and analytically tractable approach to the problem of template copying. You get a lot of information, it's easy to analyze the products by simple methods such as gel electrophoresis, but it's probably completely unrealistic as a prebiotic scenario. So what we're being driven to think about is template copying by mixtures of short, random-sequence oligomers, along with monomers, dimers, etc. And so it's a much messier situation, you have a large number of different types of substrates, the number of partial products of template copying becomes enormous, and so the analytical problem gets much worse. But nowadays, we have much more advanced analytical techniques, and with advanced methods of mass spectrometry, we can actually hope to analyze these kinds of reactions and perhaps, we'll see that this kind of system gives unexpected benefits. We have the possibility of nucleating the copying chemistry of multiple sites, the incorporation of oligomers means that fewer catalytic or chemical steps are required, so we're very excited about following up on this kind of more natural, "messier" but more natural scenario, in the hope that this will actually lead us closer to a realistic scenario for full replication. So I just want to end by pointing out that, in this much messier scenario, there are completely new ways in which we can think of ribozymes, catalytic RNAs, contributing to the overall process of replication. Up till now, we have exclusively thought about RNA-catalyzed RNA replication occurring through RNA catalysts that are RNA polymerases. But in these scenarios, I think that it's actually possible that the primordial replicase might've been a nuclease. For example, if an oligomer binds and then gets extended, but a mistake is made, you make a mismatch. As we've discussed, that slows down subsequent primer extension, then that can be a drastic effect. So, if there was a ribozyme nuclease that could chew off that mistake, it would allow chemical copying to go back to normal. So that would be one way of speeding the process up. Another scenario comes from thinking about the use of oligonucleotide substrates. It could be that overlapping oligonucleotides bind to a template, and so this would be a kind of a dead-end situation, unless you have a nuclease that can come along, trim away the overlap, and allow chemical ligation to complete the process of template copying. So, I think these kinds of changes in the way we're thinking about the process have really opened up a lot of new experiments and have made me very optimistic about the possibility of attaining a complete replication system, either purely chemically, or by a combination of chemical and RNA-catalyzed reactions. So, just to sum up then, I think that these considerations tell us that monomer purity may not be as important. It's possible that some backbone heterogeneity may not be fatal. The incomplete regiospecificity may be fine; 2' linkages may solve the melting issue for RNA. And we're very excited about studying 2-thio-U as a simple nucleotide substitution prebiotically plausible, something that might enhance both rate and fidelity. So by putting all these things together, we're hopeful that over the coming years, we'll eventually converge on a complete chemical system for the replication of either RNA or maybe some related polymer. And if we can get to that point, combining it with the replicating vesicle system should allow us to observe the spontaneous emergence of Darwinian processes from a purely chemical system. And that's really the major goal of this whole thing, and the part of the project that's most relevant to the emergence of biology from the chemistry of the early Earth. So, again, I've tried to mention people as I've gone along. In terms of the chemistry, many people have contributed to this over the years: Jason Schrum, Alonso Ricardo, Matt Powner, Na Zhang, Ben Heuberger, Craig Blain, Shenglong Zhang. Many people have contributed to this work, and so they've played a very important role in developing all of these new ideas that are leading us, hopefully, towards a solution to this major problem in thinking about the origin of life. Thank you.