Hello, my name is Britt Glaunsinger I'm a virologist and a professor at the University of California, Berkeley and an investigator of the Howard Hughes Medical Institute. And what I'm going to be doing is presenting a lecture on the fundamental molecular virology of coronaviruses. These are viruses that have been circulating in the human population and in animals for a long long time. We know of seven human coronaviruses. These are present in two of the four known genre of coronaviruses, the alpha coronaviruses and the beta coronaviruses. The four circulating strains of human coronavirus are shown boxed in red here. There are two that we've known about for a long time called 229E and OC43. These together with the other two, NL63 and HKU1, which were actually discovered more recently after the SARS epidemic, but are likely also thought to be circulating in the human population and not recently emerging. These four circulating viruses are the causes of, some of the cause of the common cold. Probably 10 to 15 percent of the common cold are caused by these viruses. We also know of three coronaviruses, which have recently emerged into the human population through species jumping or zoonotic transfer and these, of course, are original SARS coronavirus, MERS coronavirus, and the newly emerged caronavirus 2, which is the cause of COVID-19 Each of these like other alpha and beta coronaviruses are thought to have a common ancestor in bat viruses. And this is different from the gamma and the delta coronavirus genre, which have common ancestors in birds. SARS coronavirus and MERS coronavirus as I mentioned probably came from bats But it's thought that rather than directly jumping into humans from bats, they entered into the human population first through one or more intermediate hosts, animal hosts. For SARS the intermediate animal host, or at least the main one, is thought to be civet cats and for MERS coronavirus the intermediate host is probably dromedary camels in these intermediate hosts. The virus probably jumped from bats to these hosts then underwent some rounds of replication in these hosts and in doing so acquired probably mutations that allowed the virus to then more easily transmit to the human population. Now for transmission purposes, we don't know what the other intermediate hosts may be and we particularly don't know what the intermediate hosts, if there is one, is for SARS coronavirus 2, the current pandemic strain. It's possible that it came directly from bats. it's also possible that it went through one or more other animals before jumping into humans. I should mention that these are certainly not the only coronaviruses in bats. More than 500 coronaviruses have been identified in bats in China and the estimates of unknown bat coronaviruses diversity reach into the thousands, indicating that these are probably massively under-sampled in the bat population. And I think this is also important because it suggests that this current pandemic strain of coronavirus 2 is unlikely the last we will see if coronaviruses, in particular, we have now had three zoonotic jumps of highly pathogenic coronaviruses into the human population in less than twenty years and with the huge diversity of coronaviruses likely circulating in bats, I think many scientists and virologists are quite concerned that they will continue to jump into the future as well. And so very worthy of continual studying even after this current pandemic is finished It's worth thinking about comparing the other two highly pathogenic Zoonotic coronaviruses, SARS and MERS, to the current pandemic. So SARS, which emerged in late 2002 and caused a little over 8000 cases and 774 deaths globally. This was an epidemic that lasted about a year It was mostly brought under control in 2003 in the last cases were seen sort of a laboratory outbreak related to 2004. That epidemic has now ended. MERS, on the other hand, has caused a fewer number of cases 2521 to date with 866 total deaths. This is actually the most pathogenic of the viruses and the one with the highest death rate of about 34 percent. Unlike SARS, MERS infections still periodically occur, and this is probably not through circulation in the human population. This virus does not transmit particularly well human to human but these new infections are thought to occur from occasional recurrent spill overs from dromedary camels into the human population. So why is it then that the SARS epidemic was able to be brought under control within about a year whereas we are clearly far from bringing the current coronavirus 2 pandemic under control? There's several ideas that I've heard discussed on this and I just want to bring up three here Comparing SARS to the current COVID-19 pandemic. First, is that the spillover reservoir as I mentioned for SARS coronavirus was known. This is primarily the civet cat, and these animals could be called then to attempt to break the chain so that there were not further transmissions from these animals into the human population. For Cov-2, as I mentioned the spill of a reservoir is not known. Second, for SARS Cov-1, most of the human transmission occurred in a hospital setting and indeed these hospital settings were hubs of transmission for that epidemic and so once this was recognized and the risk to the medical community was recognized, personnel were able to implement barrier nursing enabled in order to stop transmission of that virus. Unlike Cov-2, which has not just spread in a hospital setting but in fact, there is widespread community transmission of this virus. And then finally for SARS Cov-1, individuals infected with this virus tended not to transmit until probably 24 to 36 hours after the onset of symptoms and in general, there was a lack of asymptomatic cases as far as we know. And so this is really important from a contact tracing perspective and the ability to stall or inhibit spread of the virus within the population through effective contact tracing and other public health measures. Unfortunately for Cov-2, the situation is very different, both in that there are possible and likely maybe abundant either asymptomatic cases and further screening will be needed in order to confirm that certainly abundant mild cases, which are furthering transmission in the human population. So for these reasons and probably other reasons having to do with the molecular virology and epidemiology the pandemic that we are experiencing now is very different from the one that we saw in 2003 with SARS and as of this morning is reaching nearly 400,000 total confirmed global cases. This is not likely reflective of the actual number of cases. These are just the confirmed cases through screening, but as we know there are currently limitations in testing So the actual number of presumed cases is thought to be much higher than this. And as of this morning so March 24th 2020 we were reaching over 17,000 deaths, seventeen thousand two hundred and fifty-two deaths worldwide. So as you can see from this graph, unfortunately most Western countries are on a very significant coronavirus trajectory which is one of basically exponential growth. While some countries have been able to slow down growth and limit spread, this is largely not the case for UK, Europe, and the United States in particular which shows a graph that is displaying a clear exponential spread, and so I think we can anticipate that the number of cases are going to continue to exponentially grow in the future. All right, we've heard a lot in the news and from experts about the epidemiology of this virus, about the spread of this virus, the transmission, and the control, and so that is not going to be the focus of this lecture today. Instead, what I want to do is really drill down on the molecular virology of how it is that this virus enters and replicates within cells in order to amplify itself. And so I've broken down the lecture into four parts. The first part, I'm going to discuss how the virus is able to enter cells through interactions with the spike protein and host receptors. I'm then going to spend time talking about once it deposits its genome into cells how does the virus replicate that genome and get its genes expressed and there are some very unusual and interesting features coronavirus biology in this section. I'm going to then move on to talking about some of the remarkable cell biological changes that occur in an infected cell Particularly, involving membranes and the formation of what are called replication and transcription complexes during coronavirus replication. And then in the end I'm going to spend the last few minutes talking about immune interactions that this virus has with, in particular, the innate immune system as these are likely drivers of pathogenesis of these viruses in animal hosts and in humans. Alright, starting with the structure of the viral particle and entry, we know that corona viral particles are pleomorphic that means they don't really have a defined structure. They've been looked at by Cryo-electron tomography to confirm this, and they also have what's called a helical nucleic acid. So looking at the structure of the virus I'm showing you here on the left the nucleocapsid which is shown in the center in brown basically refers to the genome, which is a 30 kilobase huge for an RNA virus huge genome 30 kilobase genome of RNA that is of positive sense or plus sense RNA. When we say plus sense, that means it can be directly read by ribosomes in the cell. That genome is coated with a protein called a nucleocapsid protein that forms sort of this helical nucleocapsid. that nucleocapsid-protected genome basically is encased in a lipid envelope that is derived from the host cell. Many viruses have a lipid envelope. In all cases, those lipids are taken from the host. No virus is able to make its own lipids, but many viruses make use of and steal host lipids for their replication and sometimes for their morphogenesis. And so that is the case for coronaviruses where you can see there is a lipid envelope, which is studded with a number of viral proteins, the most prominent of which is the spike protein shown in blue. This is the one that of course gives coronaviruses its name for the corona-like either halo effect seen during a solar eclipse that looks like this or a crown-like appearance of these viruses under the electron microscope. The spike protein, as we'll talk about in a minute, is critical for viral entry process. Additionally, in red is a membrane glycoprotein called the matrix protein This is the most abundant protein on the outside of the viral particle and its role is basically to connect the membrane to the nucleic acid so you can see in sort of the inset there that there is this is a transmembrane protein, but it has a significant C-terminal domain, which makes contacts with the nucleoprotein nucleocapsid protein and that's probably important for the morphogenesis phase of the viral life cycle, when these virions are formed. And another minor envelope protein called E is present as well. Also thought to be important for formation of these viral particles at the end of the viral life cycle. A little bit more about the spike protein, there have now been published a few different research papers showing structural information for the coronavirus 2 spike protein. And what this, I've pulled this from one of the papers which is cited below and what this is showing here is the structure, a cryo-electron microscopy structure of the coronavirus 2 spike protein overlaid showing sequence conservation of related spike proteins from other coronaviruses that are basically plotted onto the SARS-CoV-2 spike structure. And these are then color-coded based on their level of conservation across these related viruses. And so what you'll notice from the spike, this spike is a trimeric protein. What you'll notice is that there are sort of two domains. There's this upper globular domain which is the receptor-binding domain. This is the thing that engages the host cell receptor and we'll talk about that on the next slide and then in this domain you'll see that there are many residues that are colored in sort of a teal color And this indicates that they are highly variable. Indeed the receptor-binding domain in the spike protein is the most variable part of the coronavirus genome and this tends to be common for viruses in general. This is a region of viruses that are under intense evolutionary pressure because of interactions with the immune system. The lower part of this spike protein is the part of the protein that encodes and possesses the fusion machinery that is important for the entry process, and you'll notice this in purple is much more conserved and also that is sort of a classic finding that the fusion machinery tends to be very conserved. And tucked in the center of the fusion machinery is actually this hydrophobic fusion peptide, which is very important for being able to fuse the viral membrane with the host membrane so that the virus can deposit its nucleocapsid payload into the cytoplasm of cells. So what does this entry process look like? Well, as I mentioned, the spike protein is the protein responsible for engaging a cellular receptor. And this Is, you can think of like a lock-and-key mechanism? where the key is the viral glycoprotein and the lock is the cellular receptor. Different viruses will use different cellular receptors as a way of getting into cells. The receptor we know for both SARS-CoV-1 and for CoV-2 is the same protein. It's a cellular protein called angiotensin-converting enzyme 2 or ACE2. And that binding to that protein is important but it is not enough you need a second Feature to happen and that is a proteolytic cleavage event. And this is carried out by a cellular protease called TMPRSS2 and perhaps others, but that one people have suggested as clearly involved for coronavirus 2 entries. So what happens is that the spike protein interacts with the receptor. This protease then comes and cleaves the spike protein. Actually, there are two cleavage events at least two cleavage events that are known for SARS coronavirus and probably CoV-2. These cleavage events the first one what's happening is that the receptor-binding domain of the spike protein is being separated from the fusion domain and the second cleavage event, which is not shown here is actually an activating fusion event that activates the fusogenic state of this protein. And so that allows then subsequent entry, which for coronaviruses may occur at directly the plasma membrane may occur upon into cytosis or may occur at both sites. That really hasn't been resolved So the spike protein is really a classic class 1 fusion protein and there are a number of viruses that have fusion proteins of this type. The best characterized are influenza, the hemagglutinin protein for that. There's Ebola virus fusion protein is also class1. HIV fusion protein is also class 1. And so what I've outlined here on the bottom is the basic stages that are known to underlie the fusion mediated by these class 1 fusion proteins. So first, in the pre-fusion state, you can think of this as almost sort of like a metastable state for the fusion protein. And prior to proteolytic event that triggers the fusion process, this receptor binding subunit, which has not been cleaved off yet, basically, you can think of as sort of clamping the fusion subunit and keeping it tucked away and inactive until the viruses encountered the appropriate host cell and it can be activated by these proteolytic cleavage events. So protease cleavage that we talked about then causes the receptor binding subunit to move out of the way and that unclamps the fusion subunit so that it can then form a pre-hairpin that is embedded into the target membrane of the cell and this occurs through the fusion peptide. The fusion peptide is a stretch of hydrophobic amino acids. Usually, which means that they can be inserted into the membrane. This pre-hairpin then starts to fold back, basically forming a six-helix bundle and progressively pulling the cellular and viral membranes together to promote fusion. And the final post fusion conformation in these class 1 fusion proteins is always a trimer of hairpins. And by this mechanism then once the fusion has occurred the viral nucleocapsid with the genome payload can be deposited directly into the cytoplasm of the cell. Some early studies that have now emerged from SARS-Cov-2 indicate that there are some interesting features that are different between its spike protein and that of the original SARS-Cov-1. And the first difference is that scientists know from research with the spike protein of SARS-CoV-1 that there are basically six critical amino acids within the receptor-binding domain that are necessary for interaction with the ACE2 receptor and interestingly five of those six residues are different for SARS-CoV-2 than for SARS-Cov-1. Nonetheless, CoV-2 is still able to quite efficiently interact with the ACE2 receptor. The second notable difference is that uniquely SARS-CoV-2 seems to acquired a polybasic cleavage site. This polybasic cleavage site is interesting and important because it's predicted to enable cleavage by other cellular proteases beyond the one that we talked about. It may also enable efficient cleavage by the cellular protease the TMPRSS 2 protease known as the sort of the canonical one that's been thought about for this virus. And is particularly important because insertion of a polybasic site in other viruses Has been shown to increase transmissibility, particularly for pathogenic influenza viruses. So it's going to be important to figure out whether the same is true for SARS-CoV-2. Okay, so that covers entry and we're now going to move on and talk about what happens to the viral genome once it has moved into the cytoplasm of the cell. Well, the 2019 CoV-2 genome has been annotated and depending on the annotation that you look at, it's thought to possess about 14 open reading frames, encoding an estimated 27 or so proteins. Now let's think about this for a minute because it's kind of remarkable. Remember that the viral genome is a single stretch of RNA that is incredibly long. It's 30 kilobases long. But for any virus, the same is true for coronaviruses, once that RNA is deposited into the cell, the ability to translate or generate proteins from that RNA is requires the virus to basically follow the gene expression rules that are set by the host cell. And for eukaryotes, translation is a process that's generally a monocistronic one, which means that a ribosome comes and recognizes an RNA in the cell. It will translate generally one open reading frame - one gene from that RNA before recycling and falling off. This is different than prokaryotes, which of course have multicistronic RNAs where multiple proteins can be translated from the same RNA, not generally true in eukaryotes. So, how is it that from one RNA then this virus is able to express 27 different proteins? Well for coronaviruses, there are at least three solutions or three well-known solutions that the virus has evolved in order to solve this problem of expressing many proteins using the eukaryotic rules of gene expression and translation. And we're going to talk in some detail about those. The first is that if you'll notice a large portion of the genome is made up by a single open reading frame, called open reading frame 1, which is separated into just sort of two sub-open reading frames, 1a and 1b This is a giant open reading frame that is basically translated into what's called a polyprotein. It's a series of many proteins fused together with no stop codons intervening them to generate one giant protein, which is then proteolytically processed and we'll have that on the next slide. This protein is also generated in two different forms through the use of a programmed ribosome frameshifting event, which will also talk about on the next slide. That gets you translation of all of the open reading frame encoded proteins in that portion of the genome, which are generally the nonstructural proteins of the genome. But it doesn't get you translation of all of the structural and other accessory proteins which are found on the 3-prime half of the genome, the 3-prime end of the genome. And for these to be made, the virus uses a very unusual strategy of discontinuous transcription that produces something called subgenomic RNAs and we'll discuss those as well. So to start, remember that the virus has to first make proteins that are going to be necessary for it to be able to copy its genome and transcribe the rest of its genes and for any RNA virus this requires an RNA dependent RNA polymerase or an RdRP If you are a plus sense RNA virus like coronaviruses are your incoming genome is basically recognized as a messenger RNA, it's ribosome ready. So you don't have to package that RNA dependent RNA polymerase complex or protein in your virion because it can be directly translated from the genome and that is what happens. That's what is encoded by this giant open reading frame 1a or 1ab. So this is made as I mentioned into a huge polyprotein. Within this polyprotein are two proteases that the virus encodes and these proteases, the job of those proteases is basically to now cleave this giant polyprotein as shown here on the left in the lower pullout into the individual proteins which are going to have separate functions for viral gene expression and replication and so you get proteolysis to generate many different proteins from one initially translated polyprotein Additionally, you'll notice that this polyprotein as I mentioned is not just translated as one giant open reading frame to start There's a frame shifting event. So a portion of the time, maybe 50 or 60 percent of the time, the ribosome will read through and there's a stop codon at the end ORF 1a, so it will stop there. However, the remaining percentage of the time the viruses enables the ribosome to actually read through that stop codon and continue translating down to generate a longer ORF 1ab fusion. And that programmed translation read-through is a frame shifting event that is governed by two properties of the genomic RNA. The first is that right around that stop codon, there's something called a slippery sequence and this is shown in in the RNA diagram on the right the sequence of UUUAAAAC and when the ribosome lands on this site it's known that it tends to have a propensity to occasionally slip back out of frame. Now the frequency with which that frame shifting event occurs can be increased, and is increased, in these coronaviruses because just downstream of that slippery sequence is what's called an RNA pseudoknot structure. This is basically a highly stable RNA structure that causes the ribosome when it encounters it to pause so the structure is thought to interact with the ribosome causing the ribosome to pause over the slippery sequence, which increases the chances that it will slip back out of frame if it slips back one nucleotide out of frame that stop codon at the end of ORF 1a is no longer read as a stop codon and the ribosome can continue to translate through and generate the rest of the viral polyprotein. Okay, then that protein is processed as I mentioned, but how do you get production of all of the rest of the proteins that are found on the 3-prime end of the viral genome? The structural and accessory proteins. These are made from basically a nested set of what are called subgenomic RNAs that have they're all 3-prime coterminal. So this is important if you think about it for how these are going to get their proteins expressed. These are not polyproteins, but by having this nested set of subgenomic RNAs, what this enables the virus to do is have each of these genes on the 3-prime end of the genome, have a chance to be present as the 5-prime most open reading frame on a messenger RNA. Let's think about it this way where if you are generating an RNA, for example, where in this case gene 2, which would represent the spike protein, for example, is at the 5-prime end the ribosomes are going to come translate gene 2 and everything downstream of it based on the eukaryotic rules of translation is basically going to be viewed as UTR sequence - untranslated sequence. So only gene 2 will get translated into protein 2 or spike in this case. The same thing if you generate a transcript in which now gene 3 has the chance of being the 5-prime open reading frame that will get translated into protein and everything downstream will be untranslated sequence. So every gene at the 3-prime end of the, every open reading frame at the 3-prime end of the viral genome has a chance to be the 5-prime open reading frame on a messenger RNA, allowing it to get translated. How this happens is quite fascinating and involves another feature, which I hope you have noticed here on these RNAs that I've drawn and that is that in addition to being 3-prime coterminal all of them have the exact same sequence at the 5-prime end and that exact same sequence is the sequence that is the same at the 5-prime end of the genomic RNA called the leader or L sequence. So how is it that you are able to get the same sequence, which is not present within this 3-prime end of the genome, how are you able to fuse that to the ends of each of these subgenomic RNAs? And that basically the answer to that underlies how these are produced It involves a series of sequences called transcription regulatory sequences or TRSs. At the junction between each of those genes encoded by the virus as well as at the the 5-prime end of the genomic RNA just downstream of the leader sequence, which is denoted in red here, are these conserved TRS sequences these transcriptional regulatory sequences. And so as the polymerase is coming and copying the genome, it's going to reach these TRSs, which are at the 5-prime end of each of the genes. There's a core highly conserved sequence within these TRSs. This is called the core sequence or denoted as CS here in yellow. And so once the polymerase gets to these TRSs and copies this core sequence it can either continue to copy or it will now jump from that sequence, probably through a long-range RNA-RNA interaction and base pair with the same core sequence that is part of the TRS at the 5-prime end of the genomic RNA that is just downstream of the leader and then the polymerase will continue to transcribe there by capturing the leader sequence. So this looks something like this, where the nascent RNA is shown in red. The RNA polymerase starts to copy. You'll see that the TRSs are present at just upstream of each of the genes in the virus. As the polymerase gets to one of the TRSs it either can read through that TRS and go on to the next one or it will jump and translocate basically to the TRS at the extreme 5-prime end of the genome finish its transcription to generate that fusion with the leader sequence. So this is discontinuous transcription. It allows for the generation then of a series of these subgenomic templates. Remember, these are copied from the plus sense RNA genome. So these are now negative sense or minus sense RNAs. They're complements of the genome but not ribosome ready themselves. For that the polymerase now has to go back make copies of these minus sense subgenomic messenger messenger RNA templates to generate the actual positive sense messenger RNAs that can be translated. It's worth thinking about this mechanism of discontinuous transcription means that there's a lot of polymerase jumping and probably facilitates what are known to be extraordinarily high recombination rates within coronaviruses. As high as I've heard estimates of about 25%. Most RNA viruses and plus sense RNA viruses have vanishingly low levels of recombination and so this is a unique feature to coronaviruses which may be interesting in regards to how they evolved. And perhaps also how they are able to maintain such enormous genomes. So this discontinuous transcription mechanism is quite complex and is orchestrated by a replicase that includes the polymerase, but many other proteins as well. And this replicases complex requires functional integration of the RNA polymerase, capping, and proofreading activities as well as other things. And so what I'm showing you here on the left is a structure of basically what people think is the polymerase holoenzyme this is made up of the nsp12 RNA dependent RNA polymerase itself together with two other nonstructural proteins nsp7 and nsp8, which are thought to help with processivity of the RdRP. As I mentioned, this is thought of as perhaps the core holoenzyme of the polymerase and it is believed to be able to initiate de novo primer independent RNA synthesis. In addition the complex is associated through protein-protein interactions with another nonstructural protein called Nsp14, which is a bifunctional protein that has both capping activities and an exonuclease activity, which turns out to be a real paradigm-shifting activity for how scientists think about RNA virus evolution. I'm going to spend some time talking about that but first to mention that it's not just this but these proteins mentioned above but in fact, there are a variety of other viral processing proteins and activities associated with the replicase complex not all of which are well biochemically understood as well as from an undefined set where at least incompletely defined set of cellular proteins that may participate in its regulation as well. So a very complicated replicase complex involved in orchestrating this discontinuous transcription mechanism. Right, back to this exonuclease that I mentioned as being part of the polymerase complex. Turns out that the theoretical limit for how large an RNA virus genome can be is about 30 kilobases. And this theoretical limit comes from the observation that in RNA viruses, which all have RNA dependent RNA polymerases these RdRPs do not have proofreading capacities. This is different than polymerases in our own cells. And this means that they are error-prone, and this error-prone capacity of RdRPs underlies the massive evolution that occurs during replication of many RNA viruses to generate things called quasispecies and mutant swarms that are highly characteristic of infections like HIV and influenza. And what it also means is that most viruses actually don't even come close to that theoretical limit of 30 kilobases. Most RNA viruses are in a well below 20 kilobases and most are in the you know, sort of 10 to 12 perhaps kilobase order. Now this there are viruses like as I mentioned coronaviruses and others in a larger grouping of similar viruses called Nidovirales that have shockingly large RNA genomes - 30 kilobases. We even know some that are now beyond 30 kilobases. So even exceeding the theoretical threshold. And within these viruses, only these viruses, not all of them but many of them, have this exonuclease activity that's present, and so this led to the idea that this exonuclease activity could actually be conferring a proofreading function on the RdRP, which as I mentioned was a real paradigm shift in thinking about the how RNA dependent RNA polymerases might actually be able to proofread. So indeed in SARS coronavirus if the ExoN, this exonuclease gene is mutated and then the number of substitutions or mutations that occur during replication of this virus are measured, you can see here from this graph that compared to the number of mutations that occur in the wild type virus, there's more than a 20 full jump in the mutational frequency in the virus lacking this ExoN activity. So you can see this spread across the rest of the coronavirus genome here first focusing on the upper panel. Where in dark in black basically are the lines showing the frequency of mutation in the populations during infection with a wild type virus and the gray lines show the same thing during infection with the ExoN mutant virus, and you can see that there's a significant increase in a number and distribution of mutations that are acquired. This also renders these viruses in mutation of ExoN and renders these viruses hyper susceptible to mutagens as shown in the lower panel, which include here what was tested is 5-fluorouracil, and so you can see of course 5-FU treatment, which is a mutagen increases the mutational frequency of the wild type virus, but further increases the mutational frequency of course of this ExoN mutant. So it's also interesting that you might expect that if this exonuclease activity was what allows the viruses to reach these enormous genome lengths that it would be absolutely essential for the virus and for some virus as it is. That they cannot tolerate a mutation in the ExoN, but SARS coronavirus and some others in fact, while they are attenuated mutants can evolve and adapt over multiple passages to stabilize populations and actually prevent lethal mutagenesis. And so the location of these will be you might think of as sort of suppressor mutations on the genome, would be expected to do things like increase the processivity perhaps of the RNA dependent RNA polymerase, and they may be doing other things as well. So that I think is a really interesting concept to think about and in fact in the murine betacoronavirus called MHV an ExoN mutant there showed clear promise as a vaccine strategy at least when used in mice because it was an attenuated strain, but subsequently allowed protection from a challenge with a wild-type strain. This nsp14, which is the exonuclease is really a fascinating protein. It's a bimodular protein that is composed of two different domains that basically have two different activities. So that there's this ExoN domain here, which is involved in proofreading and then there's also a domain that's a methyltransferase domain thought to be involved in messenger RNA capping reaction. And these two domains are separated by a flexible hinge region and probably allows them to have orient the protein in different ways as these different functions are needed. And the ExoN works in concert with another nonstructural protein called nsp10. Together these operate as a heterodimer and they function in basically a mismatch repair mechanism. So actually ExoN, this proofreading activity can efficiently excise ribavirin, which is a chain terminator that is commonly used as an antiviral against many different RNA viruses. But is known not to work against coronaviruses and that's because this proofreading activity can basically remove That nucleoside analog and allow the virus to continue to replicate It's been shown with this mouse coronavirus, MHV, that an ExoN1-knockout is inhibited more efficiently than the wild-type virus by Remdesivir, which is another nucleoside analog that's being explored extensively right now for its potential to block CoV-2 replication. And what that suggests is that ExoN probably also reduces the incorporation of Remdesivir as well. And so for that reason it's probably going to be beneficial to perhaps try simultaneous targeting of both the RdRP with Remdesivir and ExoN with some sort of a specific exoribonuclease inhibitor as well. Alright, so now having explained how the virus is able to replicate its genome and get its genes expressed through this incredibly sophisticated replicase complex, I'm now going to move on and talk about where this happens in a cell because it turns out that the virus is able to form these very intricate membrane structures called replication and transcription complexes. So, these are basically interconnected double membrane vesicles where viral replication and transcription can occur. And I'm showing you here some images from a reference that I've cited below that Are come from cryo-electron tomography of coronavirus-infected cells. And you can see on the far left an EM image showing one of these classic double-membrane vesicles that are formed in infected cells. And a more zoomed out image of that is shown in the center where you can see that the cell is basically now contains many of these double-membrane vesicles and on the far right, what you're seeing is a 3D surface rendering from a cryo-electron tomograph of these, where you can see that in purple shows the inside of these membranes. And many of them are actually interconnected in that the outer membrane sort of encapsulates multiple of these vesicles at once. These convoluted membranes are derived from the endoplasmic reticulum, and as I mentioned many of the double-membrane vesicles that looks from these from these tomography experiments are actually interconnected by their outer membrane and are part of an elaborate network that's contiguous with the rough endoplasmic reticulum. Inside of these compartments is where viral replication and transcription is thought to occur. And so this is works for the virus and probably benefits for the virus in multiple ways. First of all, by compartmentalizing they can protect their genome from potential attack by antiviral mechanisms or other exonucleases or nucleases that might be present generally in the cytoplasm. It also can help them concentrate the factors necessary to efficiently replicate and transcribe the viral genome. Because these replication compartments, these are RTCs are essential for replication of the virus, these are have been discussed as potential antiviral targets by trying to disrupt this membrane formation. There's been a lot of work trying to explore how these are formed. And what is known is that there are integral membrane proteins that are part of the replicase complex that are thought to function in vesicle biogenesis. And the three replicase components that are predicted at least to have a transmembrane domains in them are nsp3, 4, and 6. And these are thought to be directly involved in vesicle formation. In a study that I'm citing below here, it's been shown that two of these nsp4 and nsp3, when expressed alone outside of the context of infection, are actually sufficient to drive to drive these double-membrane vesicles formation and it's thought that this occurs by an interaction between the luminal loops of these proteins that drive the membrane curvature and vesicle formation. So there's also been recently work to try and identify what are the components of the proteome basically associated with these replication and transcription complexes. And this has been studied with the mouse coronavirus, MHV using a proximity labeling-based approach involving the biotin ligase BirA which was fused in the context of the virus to one of these replicase proteins nsp2, known to locate within these replication compartments and so through the addition of biotin which could then be transferred to proximal proteins, these proteins could then be purified identified by mass spectrometry to identify the RTC proteome basically and then in this particular study that I've cited below, they then took these hits and did a targeted siRNA screen to figure out which of the components that are host factors are actually necessary for viral replication, which are the proviral factors here. And I want to note that they threw out hits that compromised cell viability on their own, so what these are are hits that decrease coronavirus replication, but don't impact the viability of the cell. And what they noticed, of course, are that there are a number of things involved in cellular transport which is not and vesicle formation which would be to be expected and our interesting hits for future follow-up as well as a number of catabolic processes. Several hits in the proteasome that finding is kind of interesting as it could provide a link to the described coronavirus replication transcription complex encoded protein nsp3 which is thought to have deubiquitination activity. And then, quite interestingly, some of the top hits were in translation machinery, these eIF3 components of the translation complex. And they were able to use fluorescence imaging of pure myosin labeled cells, which is basically a pulse labeling way to detect nascent transcription. And this showed really pronounced enrichment of actively translating ribosomes near these viral replication transcription complexes, particularly early-to-mid infection indicating that the translation machinery, in addition to the transcription machinery, is recruited near these membranous webs, basically. Also, it was interesting to look at what are the viral proteins that are present within these membrane complexes. And in pink are the viral proteins that were significantly enriched and it makes a lot of sense because most of these are the nonstructural proteins which are known to be involved in replication and transcription, so they should be there. It's also interesting to look at what wasn't there. And so for example, one of the proteins that was not significantly enriched there is a nonstructural protein called nsp1. Nsp1 is fascinating. It is a key coronavirus pathogenicity factor. It's a host shutoff factor that basically restricts gene expression coming from the host cell and it does this via a two pronged approach. Nsp1 is able to interact directly with the 40S subunit of the ribosome and in this way block translation of host RNAs and also mediate endonucleolytic cleavage of these RNAs in a pretty widespread way, leading to broad accelerated messenger RNA degradation in these cells. And this benefits the virus, perhaps for at least two reasons. One classic way of thinking about why host shut off benefits the virus, is that it helps viruses shunt gene expression machinery away from the host cell and towards viral needs. The second reason, which has been directly demonstrated for these coronavirus nsp1 proteins is that this is a general immune evasion tactic because by promoting widespread RNA degradation, many of these RNAs are going to be things that are induced as part of the interferon response and this helps the virus delay the interferon response. Notably, for nsp1, it seems to be specific for cleaving host RNAs because the leader sequence that 5-prime leader sequence that we talked about for subgenomic RNA synthesis and that's present on the genomic RNA, appears to protect viral transcripts from nsp1-mediated cleavage. And so this is aselective shut off of host, but not viral RNAs. This activity in particular, the thought that it blocks an interferon response is quite relevant for viral pathogenesis. Indeed, It's been shown that if this nsp1 protein is mutated, and this is a mouse survival curve here, you can see that while mice infected with a wild type virus generally are dead about 6 days after infection. In the absence of this key virulence factor, all of the mice survive the infection. And so this mutation of this factor has also been something that's been explored as a potential vaccine strategy. Alright, beyond that nsp1 virulence factor, several of the other things that are not present in these RTC complexes, if you look at these, are basically assembly and virion proteins, things like the matrix protein, the envelope protein, the spike protein and that makes a lot of sense because viral morphogenesis or assembly is not happening in these RTCs. That's happening in sort of a discrete presumed location and so it's sort of makes sense that they would not be part of these RTC complexes. Additionally, not part of the RTC complexes are many accessory proteins. And so what are accessory proteins? These proteins and genes are things in viruses in general that tend to be specific to a particular viral species or a particular viral genus. And frequently accessory genes are dispensable for viral replication in tissue culture cells but play really important roles in the virus-host interaction in an in vivo context with the animal or the human. And so what I'm showing you here in this diagram are the accessory proteins, which are labeled in blue for a number of representative betacoronaviruses, you can see that SARS-CoV-2 is included in this diagram in the center, compared directly with SARS coronaviru. You can see that they, in fact, share a number of accessory proteins that look pretty similar, but I think that it's going to be interesting to compare the differences as in the future as there are, at least from sequence gazing, several notable SARS coronavirus 2 variations in these accessory proteins. In particular, in accessory proteins that are involved in interaction with the the innate immune response and perhaps countering the interferon response and some of those are listed here in this table accessory proteins 3a, b, open reading frame 6, open reading frame 8. Each of these has some notable differences. You'll also note that, I'm not going to go through the functions of all of these on the table, though I will point out that even for SARS coronavirus and other coronaviruses the functions of many of the accessory proteins are only partially worked out or not yet established. This is going to be an important area for research in the future. Okay, so we've talked about the composition of these replication transcription complexes which are formed from this elaborate ER-derived network of vesicles. And then once the viral genomes though are replicated within these they need to assemble into new viral particles, and this is called viral morphogenesis. And assembly is basically driven first by association of the nucleocapsid protein with the genomic RNA. This assembles to form those helical nucleocapsids, remember that are formed in the center of the viral particle. Then these need to associate with the components of the viral membrane. And so these are the spike protein, the matrix protein, the envelope protein. These are all integral membrane proteins that are inserted into the endoplasmic reticulum, and then the nucleocapsid, which is bound to the viral genome, then buds into these In perhaps in the ER Golgi Intermediate Compartment or labeled here as ERGIC it's known that budding occurs in association with the Golgi and then these particles are then probably glycosylated at particular sites and released through a process that's like exocytosis, out of the cells so that they can then go on and infect neighboring cells. Okay So that is the basic replication mechanisms of the virus and now in the last few minutes, I want to turn to immune interactions of this virus. First I want to point out that SARS and MERS coronavirus and we don't know yet the answer for CoV-2, an interesting feature of these viruses is that they induce very little if any interferon in most cells. And this is illustrated in this image that I've shown here, where you can see that the upper gel shows a signal for interferon beta and in control cells and these are infected with Bunyavirus, which is a negative sense RNA virus, which clearly induces interferon beta quite robustly as many RNA viruses do. SARS coronavirus stands in stark contrast to that control and that you can see very little interferon beta signal that is induced. And so why is this the case? Well, we've touched on this a little bit already, but just to hammer at home that there are a number of putative interferon antagonists that have been identified in the SARS coronavirus genome, several nonstructural proteins we talked about nsp1, several accessory factors that I touched on, as well as, the matrix nucleoprotein may be able to counteract this as well. So it appears that this virus and perhaps CoV-2 as well has really a multi-pronged approach to dampen the early interferon response to the virus. And this is thought to be really important for viral pathogenesis. And indeed SARS pathogenesis has been shown to be linked delayed interferon 1 signaling and subsequent immune toxicity, and so let's look at this first in terms of this survival. A graph here shown here for mouse experiments where you can see that wild-type BALB/c mice, when infected with coronavirus, tend to succumb to infection by about 6 to 8 days post-infection. However, if you infect mice that are lacking interferon signaling, so they're lacking the Ifnar receptor, their knockout for that, you infect these mice with wild-type virus and none of these mice died. That suggests that the interferon response ultimately is linked to death of these mice upon coronavirus infection. And that's not because these mice that lack the Ifnar protein are able to replicate the virus any differently and that's shown here in this graph in panel D, where you can see that the replication the levels of replicating virus, as measured by plaque-forming units in the lung are basically very similar between the wild-type mice and Ifnar knockout mice. And so the hypothesis is that the virus is able to replicate too high initial titers because of these accessory factors and other multi-pronged approaches it has to delay the interferon response. But then an interferon response comes on later and sort of at an inappropriate time because it can no longer be used to stop the initial virus infection, but what this response is doing is driving aberrant recruitment of pathogenic inflammatory monocyte macrophages and activation of the innate immune response leading to cytotoxicity. And so that's shown here in these diagrams, where on the left you have an uninfected alveolus and these cells upon acute coronavirus infection start to implicate rapid virus replication because the virus is preventing the antiviral interferon response early. This leads to inflammatory cell infiltration and release, both probably from the infected cells as well as from these infiltrating inflammatory cells of proinflammatory cytokines and chemokines responses and it is those immune responses that are thought to lead to acute lung injury and acute respiratory distress syndrome, so a clear immunopathology associated with these infections. Finally, I want to note that it's been shown for SARS and also for the circulating human coronaviruses that neutralizing antibody titers, which are shown on the graph here, and the memory B cell responses, which are not shown here, are both short-lived SARS-recovered patients. And so the black line here shows a cohort of SARS patients that were monitored for neutralizing antibody and you can see they do mount a robust neutralizing antibody response; however, this response is not sustained, and that by a couple of years after the initial infection their response is basically disappearing. Now you see a couple of outlier patients shown in the in the green line and the orange line, indicating that some people may be able to mount a sustained protective response, but for most people infected with virus immunity probably wanes and I think that's going to be important in thinking about, in particular, whether that is also the case for CoV-2 and what does that mean for continued circulation of this virus. So I think there are a number of really important immunological questions that need to be answered for CoV-2 right now that are going to really greatly inform the thinking about how this virus causes pathogenesis and in control of the pandemic. And I've just outlined a few of these here. For example, how does seroconversion look like for CoV-2? How long do recovered individuals stay immune? And can they be reinfected? What type of immunity will we get from vaccines? And how does it compare to the infection response, which I've shown here. We also really need more information about what's happening in the older population, particularly in regards to their immune responses, immunology, and inflammation that's happening in these patients. Because in part this will help scientists identify parallels that should be looked for in animal models and these animal models themselves are in need of significant development for CoV-2. Okay, so with that I just want to end by listing some of what in my opinion are some of the key open basic science questions about these viruses. First for SARS-CoV-2, what is the role of the polybasic sight in the spike protein in CoV-2 transmission? is this really a component that has helped speed up transmission of this virus? What are the pathways involved in coronavirus-induced membrane remodeling, and how do replication and transcription complexes temporally and functionally coordinate the various stages of the viral life cycle? What are the biochemical activities and roles of the various proteins that form this highly sophisticated replication transcription complex? How do they coordinate replication and transcription at different stages of viral life cycle? How do these coronaviruses maintain such a large genome and still have sufficient mutation rates for adaptation and trans-species movement, which we know certainly occurs for these viruses? What are the functions of the CoV-2 accessory proteins and how do they impact the in vivo growth and virulence of the virus? And will coronavirus 2 infected individuals or vaccines mount protective long-term immune responses? Okay, so with that I'm going to end and I first want to acknowledge that I got a lot of assistance in collecting information and slides for this talk from Professor Laurent Coscoy as well as from members of my lab: Divya Nandakumar, Ella Hartenian, and Michael Ly, Azra Lari, Jessica Tuckers, and Allison Didychuk. I would also like to mention that if you're not a virologist but you're curious about how viruses and viral research have really informed a lot of the basic understanding of molecular biology, I've recorded an open-access iBio talk on that the link of which is below and then most importantly I really want to thank all of the coronavirus researchers Who generated all of the sets of information that I talked about today, and who are playing really key roles in the response to this current pandemic as well as all of the scientists and medical personnel who are working really tirelessly to fight the pandemic, and we are seriously indebted to them So with that, thank you very much.