Hi. I'm Steve Bell. I'm a professor of biology at MIT and an investigator of the Howard Hughes Medical Institute. And what I'd like to tell you about today is some of what we know about the mechanisms of chromosomal DNA replication. Now, this event is primarily mediated by the function of a complex multi-enzyme machines called replisomes, that include three DNA polymerases, an RNA polymerase, as well as a DNA helicase. And together these enzymes must act to accurately, completely, and rapidly replicate the genomic DNA. The rapidity can be illustrated by the fact that it moves... these replisomes can move at up to 1000 base pairs per second, and also that entire genomes can be duplicated in as little as three or four minutes. The accuracy is about 1 mistake in every 10^10 base pairs. To put that into perspective, that would mean that if you typed 60 words per minute for 38 years, continuously, you would have a single typographical error in the document you made... so that's pretty impressive. And finally, it's really important that genomic replication be complete, because at the end of genomic replication comes cell division, and you need a full copy of the genes for both of the daughter cells, so, if you do incomplete replication, someone is not going to get their full complement of genes. It's also worth noting that when you try and segregate the chromosomes into two cells when they're not completely replicated, this will lead to double-stranded breaks, which can be both mutagenic and, in some cases, lethal. Now, it's sometimes hard to appreciate how much DNA replication goes on inside our bodies, but the next slide is sort of an attempt to make that clear to you. So, many of you may know that in each one of your cells, there's approximately 2 meters of DNA, and some of you may know that there's... in your entire body, there's approximately 150 million kilometers, and that's enough DNA to go from the sun to the Earth. Now, sticking with the astronomical sense, the most remarkable number is not how much is in your body right now, but the amount that you will synthesize in your lifetime, which is upwards of a light year of DNA. And, if you think about that, if you're not sort of up on your astronomical units, that's 9.5 trillion kilometers of DNA. And so the mere fact that I can stand here and tell you about DNA replication, and not be just a bumbling mass of mutagenized cells, is a testament to both the accuracy and efficiency of this process. Okay, so I'm just going to show you an animation, here, of the process of DNA replication at the E. coli replication fork. And what I want to do over the next few minutes is to tell you about the various enzymes that are working together in this animation so that you understand exactly what's happening during this process. So, the first thing I want to do is tell you a little bit of the ground rules. So, DNA polymerases always extend the 3' end of a growing DNA chain, and this can be either extending the 3' end of a DNA chain or, it turns out, it can also extend the 3' end of an RNA chain. And in each case, it extends this by reading an oppositely-oriented template strand, and in fact the process... or, the place at which DNA polymerases start DNA synthesis is called a primer-template junction, and I'll be using that term throughout the next few minutes as I describe the function of these enzymes. Now, both strands of the DNA are replicated simultaneously during the replication of chromosomes, and this is to reduce the amount of single-stranded DNA, which is much more prone to chromosome breakage than is double-stranded DNA. Now, there's two different types of DNA polymerases acting at this time. One is called a leading-strand DNA polymerase, and it acts by extending the leading strand DNA towards the unreplicated DNA, or in the same direction as the overall DNA replication process. So, this is very easy to understand. It's going to follow right behind the unreplicated and un... and double-stranded form of the DNA. In contrast, replication of the opposite strand has to move in the opposite direction, away from the direction of overall fork movement or replication. And so, in this instance, the primers will be formed and the polymerase will move away from the overall unreplicated DNA, and in the opposite direction of the fork movement. Now, these two events are happening simultaneously at the replication fork, as is illustrated here, and you can see that the lagging strand DNA polymerase is moving in one direction, the leading strand is moving in the opposite direction. And when you finish an Okazaki fragment, as these smaller fragments that are made on the lagging strand are called, you then reposition the polymerase and start a new Okazaki fragment, and, eventually, at the end of the replication process, these primers that are used, which I'll have more to say about in a moment, have to be removed, and the DNA linked together to form a continuous strand, unlike the leading strand, where it is continuously synthesized. Now, one property of DNA polymerases, all DNA polymerases, is that they cannot start a new DNA strand by joining two deoxynucleotide triphosphates. They have to have a primer, in the form of a primer-template junction. And so, in order to initiate the new strands that are required for the replication process, we need a different enzyme called DNA primase. And what DNA primase does is it synthesizes RNA primers. And this is because, unlike DNA polymerases, RNA polymerase, of which DNA primase is a specialized form of, can take two ribonucleotides and initiate a new strand of DNA. Okay. Now, importantly, once it does this it can be extended by the DNA polymerase -- it's basically forming a primer-template junction -- and one interesting property of the DNA primase in E. coli is that it is stimulated to act by interacting with another important protein that acts at the replication form, called the DNA helicase. Now, replicative DNA helicases always come in the form of hexameric, ring-shaped structures, as you see, here, okay. And these hexameric ring-shaped structures will encircle one of the two strands of the DNA, and they will then move in an ATP binding and ATP hydrolysis-dependent fashion, in a defined direction along this single-stranded DNA, and by doing so they will displace the other strand of DNA. So, you can see that, here, with the helicase unwinding the DNA. Now, I've looped this, so you'll see it a few times, but I want to point out, also, that the direction that a helicase moves on its encircled strand is a property of the helicase, and in this particular case I've illustrated the E. coli replicative DNA helicase, called DnaB. And its polarity, as this property is called, is in the 5'-to-3' direction, so you can see it's starting at the 5' end and moving towards the 3' end of the DNA. Now, we've talked about a number of proteins that are involved at this point, but one that is not an enzyme, unlike the ones we've talked about thus far, has a primary role of holding the two strands apart after you unwind. Because, of course, these two strands of DNA that I have over here are in fact complementary to one another, and could rapidly reanneal. Now, this is prevented by two different events. The first one is simple to understand -- that is, that the leading strand DNA polymerase, up here at the top, follows almost directly behind the helicase. And so, that single-stranded DNA is very rapidly converted into double-stranded DNA, and this prevents it from annealing with the complementary lagging strand template. Now, there's another concern, however, which is that the lagging strand template will anneal on itself, and so there are a specific set of proteins called single-stranded DNA binding proteins, or SSBs, that will bind the single-stranded region of the lagging template, and hold it in a single-stranded state, preventing it from annealing to itself. And what's important about this is not only does it keep it from reannealing, but when a DNA polymerase approaches a region of single-stranded bound... single-stranded DNA bound by the SSB, it is readily displaced, allowing the template that's left behind to be readily replicated by the polymerase. Okay. So, we've talked about the leading and lagging polymerases, but it turns out, at the replication fork, they're part of a larger complex called a holoenzyme, in particular, called the DNA polymerase III holoenzyme, which is a very specialized form of the DNA polymerase, for acting at chromosomal DNA replication forks. So, it's illustrated here, and there are several parts to this complex. So, first, there are three copies of DNA polymerase III, which is the third polymerase discovered by Arthur Kornberg and his colleagues in their Nobel Prize-winning work investigating the enzymes involved in DNA synthesis. Now, in addition, there is a second large protein complex, a five-protein complex called a sliding DNA clamp loader, as well as it being bound to a sliding DNA clamp. Now, all of the polymerases are held to the sliding clamp loader by a subunit that's present three times, shown here in light blue, called the τ subunit, and I'll have to say about τ, as it plays a particularly important role in coordinating the events at the replication fork. But before that, I want to tell you a little bit about the sliding DNA clamp loader and the sliding DNA clamp, and what their functions are. So, we'll start with the sliding DNA clamp. So, this is a ring-shaped multimeric protein made up of either two or three identical subunits. This is an illustration of the sliding DNA clamp from S. cerevisiae, the budding yeast, and you can see that whether we look at the one from S. cerevisiae or a phage, T4, or human cells, or E. coli, they have very similar structures, and you can see that in the overlap, here, okay? And all of them, in this central hole in the protein donut, per se, have enough room to fit double-stranded DNA. And you can see that here in a crystal structure of the yeast sliding DNA clamp bound to double-stranded DNA. Okay. Now, what's the purpose of having it surround double-stranded DNA. Well, that's illustrated on the next slide. So, these sliding DNA clamps not only encircle double-stranded DNA, but they also are able to bind to the backside of a DNA polymerase, holding it on the DNA, particular at a primer-template junction, okay? And it turns out, because they do not specifically interact with the double-stranded DNA, they will follow along with the polymerase as it synthesizes DNA. Now, one property we haven't talked about of DNA polymerases thus far is a property called processivity, and this, put simply, is the number of base pairs that are synthesized each time a DNA polymerase binds to a primer-template junction. Now, it turns out, on their own, polymerases are actually not particular processive. They'll typically do somewhere up to about 100 base pairs before they fall back off the DNA. Now, what's important about this is, while it stays on the DNA, a DNA polymerase typically adds one base, one base pair or makes one base pair, per millisecond, and so it's very efficient at doing that if it stays on the template. However, if it falls back off the template, on average, it's going to take a second to find a new primer-template junction, rebind, and reinitiate synthesis. So, what that means is that every time it falls off, it's lost the chance to add 1000 base pairs if it stayed on. And what the sliding clamp does is prevent that. In fact, I usually refer to these as personal trainers for DNA polymerases, because once it starts, if the polymerase decides, oh, I'm tired, I want to fall off, the sliding clamp holds it on the DNA and puts it right back to the grindstone, and starts this process again. And, importantly, it will only hold the polymerase while active DNA synthesis is occurring. When it reaches the end of a template and you have complete synthesis, it's readily released from the sliding clamp and eventually the sliding clamp is also removed from the DNA. So, now that you know the function of the sliding DNA clamp, we need to talk about how it's put on the primer-template junction so that it can serve this function, and this is the role of the so-called sliding DNA clamp loaders. So, these are five-subunit complexes that use the energy of ATP binding and hydrolysis to load sliding DNA clamps, specifically at primer-template junctions. Now, how does this work? Well, the first step in this process is the binding of ATP by the sliding DNA clamp loader. This changes its conformation and makes it competent to bind both the sliding DNA clamp and the primer-template junction. When the sliding DNA clamp binds, it changes its conformation by opening up the interface between two subunits, creating a crack or an opening in the ring-shaped structure. Importantly, this is big enough to fit double-stranded DNA through, and when double-stranded DNA binds to the sliding DNA clamp loader, it does so such that it is now encircled by the sliding DNA clamp. Importantly, only a DNA that has a primer-template junction can actually fit within this region. Completely double-stranded DNA can't bend enough to fit in the sliding DNA clamp loading site. Also important is the presence of a 3' hydroxyl at the site of the ATP binding, which stimulates the ability of ATP to be hydrolyzed. So, when ATP is hydrolyzed, this causes the sliding clamp to change conformations, release the sliding clamp and the primer-template junction DNA, and this causes the sliding clamp to now close again around the double-stranded DNA portion of the DNA, and now it's ready to recognize a DNA polymerase and facilitate its processive DNA replication. Now, at this point, I've told you about a lot of different enzymes, and I just want to tell you a little bit about how they work together at the replication fork. So, the DNA polymerase III holoenzyme, this large multi-enzyme complex, actually does more than just synthesize DNA and load the clamps -- it also stimulates the DNA helicase. And this is mediated, again, by that same τ subunit that is interacting with the DNA polymerase subunits, and this plays an important role, because if the DNA polymerase I, either on the leading or lagging strand, becomes stalled, it will pull the sliding... the Pol III holoenzyme away from the helicase, causing the helicase to slow down during the time it takes for the polymerase to restart synthesis. So, for example, if it hits a lesion in the DNA that has to be repaired, the helicase won't run away at the same rate as it would if it were bound to the polymerase, because the polymerase is now detached. Once the polymerase can bypass that lesion, or the lesion is repaired, it can catch back up to the helicase and the process can become very rapid, again. I've already told you that primase activity is stimulated by binding to the DNA helicase, and in fact if you modulate the level of interaction, or the affinity, of the primase for the helicase, you can actually change the rate at which it primes new syntheses, and so if it's faster you'll make shorter Okazaki fragments, and if it's slower, or lower affinity, you'll make longer Okazaki fragments. So, the rate of Okazaki fragment formation is actually determined by this affinity. Finally, DNA polymerase III, as I told you, has its processivity dramatically stimulated by the sliding clamp and, in turn, also by the sliding clamp loader, since that is required for loading it. So, now that I've told you about the enzymes and how they work together at the fork, I'd like to take you through the events that are occurring at a replication fork, one by one. So, this is an illustration of a replication fork bound to the DNA polymerase III holoenzyme. You can see the helicase, here, with its unreplicated and unseparated DNA. The top is the leading strand polymerase; at the bottom is a lagging strand DNA polymerase. There's also a third polymerase, as I've explained, which is, at this point in the reaction, unengaged, and you'll see how it becomes engaged as we move forward. Now, you'll also note that there's large regions of single-stranded DNA, here and here, that are bound to the single-stranded DNA binding protein. And in fact, I've labeled this the trombone model of bacterial DNA replication, because this loop down here actually gets bigger and smaller depending on where you are in the replication process, much as a trombone slide goes in and out as you play different notes. Now, I've shown you the SSB that's bound to the single-stranded DNA, here, but for the rest of the illustrations, just to reduce the clutter, I've removed that. Now, you'll note that there's a large single-stranded region adjacent to the helicase, and this is actually the perfect substrate for the primase, which will come in and synthesize a short primer at this single-stranded DNA region. And this is, again, mediated by the affinity of the primer... DNA primase for the helicase. Now, as soon as this is synthesized, it's recognized by the sliding DNA clamp loader as a primer-template junction. And it then loads a sliding DNA clamp onto the primer-template junction, making it, now, ready to be recognized by the unengaged DNA polymerase. So, what happens next is that the polymerase binds the sliding clamp, associates with the primer-template junction, and begins to initiate a second Okazaki fragment. Now, I want to point out, during this process... the processes that I've been explaining, the leading strand polymerase has continued synthesizing, as has the other lagging strand DNA polymerase. And, in fact, once that lagging strand polymerase reaches the end of its single-stranded template -- that is, the beginning of the previous Okazaki fragment -- it will fall off the DNA, just as I explained, because it no longer has template, and it will become an unengaged DNA polymerase, just as the one that is currently making the second Okazaki fragment was unengaged at the beginning of the reaction. Okay. So, we've taken you through this in slow motion. Let's look at what it looks like in real time. Okay. So, this... you should now be able to label all these different subunits in this process. So, you can see, in blue, here, this is the DNA helicase, okay? And it's unwinding the DNA, both strands, and feeding it to two different DNA polymerases, here. You can see the leading strand polymerase, which is immediately using the template on the leading strand... using the leading strand template to make new double-stranded DNA, and you can see its associated sliding DNA clamp, here, in the green. Now, you'll also notice that you see the arrival of a primase, okay, here it comes, right there, and it lays down a primer, which is immediately recognized by the sliding DNA clamp loader, which puts a new sliding DNA clamp on. And in this case, because the leading strand already has a sliding DNA clamp, this is immediately recognized by the lagging strand DNA polymerase, shown again in purple, and its associated sliding DNA clamp. Okay. Now, it may be a little hard to... oh, actually, I should point out... there's one obvious difference between the model I showed you and this model, which is that there's only two DNA polymerases, and that's because at the time this animation was made, it wasn't actually known that there were three, instead of two, DNA polymerases. But that turns out to be very important, because, as you'll notice, there's a period of time, each time a new primer is synthesized, and the lagging strand DNA polymerase has to recognize that new template, where there's no synthesis occurring on the lagging strand template. Now, in some organisms, this would be just fine, but in E. coli the replication fork actually moves at 1000 base pairs per second, and as you'll recall that's the absolute maximum rate a DNA polymerase can go. So, while the leading strand polymerase can handle that quite easily, the lagging strand polymerase couldn't do it if it had to come off and rebind all the time. On the other hand, if you have a third DNA polymerase, it becomes obvious how you can always have at least one polymerase, and often two, acting on the lagging strand, allowing the overall fork movement to go at 1000 base pairs per second, which is what's observed in vivo. Now, sometimes it's hard to understand how fast this really is, so let me give you a little way to think about it that's, I think, pretty impressive. So, the double-stranded DNA is 20 Angstroms wide, but let's just imagine that it's 1 meter wide, okay? Now, in that situation, the nucleotides that would be incorporating would be floating around the room you're in at about the size of a textbook, okay? And this replisome machine would be about the size of a FedEx truck, okay? The sliding DNA clamps would be the size of very large wheels, okay? And what's most impressive for a replication fork moving at 1000 base pairs per second, that FedEx truck, at the scale of the DNA being a meter wide, would be moving at 375 miles per hour. So, if you were standing in a classroom with a double hexamer tube of 1 meter wide going across the classroom, and let's say it's 50 meters... 50 feet across, that replisome would come through, spend about 0.1 second in the room, and change that one tube of 1 meter into two tubes before you would even really notice it. Okay? So, these are really remarkable machines that are accomplishing quite a bit at a time. So, I want to end by talking about a comparison between the bacterial replication fork, which I've told you about in great detail, and what we know about the eukaryotic replication fork. So, there are corresponding proteins to all the proteins I talked about, for the replication fork in bacteria, found in eukaryotes. However, it's a little bit more complicated there. So, instead of the single DNA polymerase III that's involved in replication at the eu... prokaryo... bacterial replication fork, there are actually three different DNA polymerases that work at the eukaryotic replication fork. Pol δ is exclusively making the lagging strand DNA. DNA Pol ε is exclusively making the leading strand DNA. Now, the third polymerase is actually involved in the priming event. So, while in E. coli the primase is called DnaG and it's a single polypeptide, in eukaryotic cells it's a complex between a primase, actually, a two-subunit primase, and a DNA polymerase called DNA polymerase α. And the primase synthesizes the short RNA primers and then immediately hands them off to the DNA polymerase α, which then extends them for a brief period of time before, in turn, allowing them to be taken over, either by the Pol δ or the Pol ε, leading or lagging strand polymerases. As I showed you before, there's both E. coli and eukaryotic sliding clamps. In E. coli it's called β; in eukaryotic cells it's called proliferating cell nuclear antigen, or PCNA, and, not surprisingly, this was originally identified just because it was prominently present in dividing or proliferating cells, thus the name PCNA. The sliding clamp loader is called the τ complex for reasons that you understand, now, in bacteria, and it's called the replication factor c, or RF-C, in eukaryotic cells. Again, the helicase is a little bit more complicated story in eukaryotic cells versus bacterial cells. It's a single subunit repeated six times, the DnaB protein repeated six times, in bacteria, but in eukaryotic cells the helicase, the core of the helicase is the Mcm2-7 complex, which is made of six different subunits -- Mcm2, Mcm3, Mcm4, Mcm5, Mcm6, and Mcm7 -- that form a hexamer with one of each subunit in the hexamer, so it's a heterohexameric protein. And it's not very active on its own. Instead, it's activated by binding to two other proteins, Cdc45 and GINS, to form the so-called CMG complex, which is the active replicative helicase. So, I've told you already that there's three different DNA polymerases at the eukaryotic fork. We know less about the interactions at the fork as well, although we do know that Pol ε is held at the replication fork by interactions with the GINS protein. In contrast, neither Pol δ nor the eukaryotic clamp loader is part of the replication fork, so that's very different from the prokaryotic fork. And this may be because eukaryotic forks move at a much more leisurely pace of 20-60 base pairs per second, compared to the 1000 base pairs per second, and so those lagging strand events that have to be tightly coordinated in bacteria can, quite likely, occur by just solution binding out of... instead of being tethered immediately to the site of DNA replication, because there's much more time for them to occur. And, in fact, for lagging strand synthesis, it probably occurs after the fork has gone by. So, at this point I hope you understand a lot more about how the replication fork works and, if you stay tuned to my next presentation, we'll talk about how you assemble these replisomes at sites of initiation and understanding how that's regulated during the cell cycle. So, I just want to thank the people involved in this process. So, Sera Thornton did most of the animations that you saw. They were done in the MITx Biology office as part of the development of a course that Sera and Mary Ellen Wiltrout helped me develop called 728x, which is available on the MITx and edX platforms, and is a full molecular biology course, talking about all the elements of the central dogma. In addition, the very high resolution animation showing the E. coli replication fork in action was done in the DNA Learning Center, which is part of the Cold Spring Harbor Laboratory. Finally, I want to thank the Howard Hughes Medical Institute and the National Institute of General Medical Sciences for supporting my research.