- [Voiceover] In this module we'll discuss genome editing using the CRISPR-Cas9 system in mammalian cells. Traditional gene targeting
is technology challenging and relies on the process
of homologous recombination. Spontaneous homologous recombination occurs at a very low frequency, and thus is an intrinsically
inefficient process, which has required the use
of antibiotic selection and other tricks to isolate the rare cells in which gene mutagenesis
has been successful. Genome editing takes
advantage of new technologies that let you introduce
double-strand breaks anywhere you like in the genome. By causing a double-strand break, you can dramatically improve
the efficiency of mutagenesis, whether you're simply
trying to knock out a gene, or trying to knock in specific DNA variant or stretch of DNA. Genome editing tools have
now been well validated to work both in vitro,
in cells in culture, as well as in vivo, in
organisms ranging from fruit flies to zebra fish to mice, to even non-human primates. The cell has two major ways in which it can repair double-strand breaks. One method is non-homologous end joining. It takes the two ends and
simply puts them back together. But this is an error-prone process that often results in the insertion or deletion of nucleotides. The other method by which
the cell can repair the break is homology-directed repair. Ordinarily, the cell will use a sister chromatid or chromosome
as the repair template via homologous recombination. The repair template allows the area of the double-strand break
to be cleanly replaced. You can exploit the
homology-directed repair mechanism by providing the cells
with large quantities of a traditional double-strand
targeting vector. Alternatively, you can provide a single-strand DNA oligonucleotide that matches the sequence around the site of the double-strand break. In either case, you can fool the cell into inserting a mutation into the genome by putting the mutation in the middle of the repair template. Here's a schematic of the
two repair mechanisms. On the left, you can see
how homology-directed repair allows you to perform
site-directed mutagenesis and create a specific mutant cell line. On the right, you can see how
non-homologous end joining results in the introduction of a variety of small indels into the genome, generating a variety of mutant cell lines. Over the past decade,
a number of different genome editing tools have
emerged into widespread use, including zinc finger nucleases, meganucleases, and TALENs. Each of the tools has its
advantages and disadvantages. The most recent advance
is the CRISPR-Cas9 system, which has created significant excitement in the biomedical community because of its efficacy and its ease of use. The CRISPR-Cas9 system is based on a recently-characterized
adaptive immune system found in bacterial species,
and used by the bacteria to protect against foreign DNA molecules. The system comprises both
protein and RNA components. The protein, called Cas9,
has a variety of functions. It can act as a helicase and
unwind to double-strand DNA. It can recognize and bind
a particular DNA sequence, and recognize and bind RNA sequences. It can produce a
double-strand break in DNA. In the simplified system
that's now being used for genome editing in mammalian cells, the RNA component is a
so-called guide RNA, or gRNA, that's about 100 nucleotides in length. This guide RNA is also
known as the CRISPR RNA. Cas9 binds to this guide
RNA, which itself hybridizes to one strand of double-strand DNA, as indicated by the red oval. Cas9 also binds to several
adjacent nucleotides in the DNA. Thus, a triple complex of
protein, RNA, and DNA is formed. The specificity of this complex is encoded in the first 20 nucleotides
of the guide RNA, indicated here in blue. By changing this 20-nucleotide sequence, one can change the DNA sequence to which the protein
RNA complex will bind. Once bound, the complex will generate a double-strand break in the DNA. There are some clear advantages
to the CRISPR-Cas9 system. The Cas9 protein is a fixed component. It remains the same, regardless of which DNA sequence you wish to target. This contrasts with other
genome editing tools like zinc finger nucleases and TALENs, where new proteins must be produced for each new DNA sequence
that's to be targeted. With CRISPR-Cas9, it's the
RNA component that's changed. In order to change the specificity of the CRISPR-Cas9 complex, all you need to do is change the first 20 to 21 nucleotides of the guide RNA. Because all this requires is
very simple molecular biology, it only takes a day of laboratory work to create a new guide RNA. Indeed, it's so straightforward
to make guide RNAs, that you can make a large library
of guide RNAs all at once. For example, a library that covers all of the genes in the genome. Another advantage of CRISPR-Cas9 is its multiplexing capacity. If you wish to target two genes at once, you can mix Cas9 with
two different guide RNAs matching the two gene sequences. CRISPR-Cas9 complexes will form and create double-strand breaks into
two genes simultaneously. With the use of several
guide RNAs, you could potentially target several
genes at the same time. Here's a schematic
showing how genome editing with the CRISPR-Cas9 system works. If you want to knock out a gene, you use a guide RNA whose
first 20 or so nucleotides match a sequence in the
coding portion of the gene. This DNA sequence is
known as the protospacer. Of note, the protospacer must be adjacent to a DNA sequence that
is known as the PAM, highlighted here in red. We'll learn more about
this in a few slides. Cas9 and the guide RNA form a complex on the protospacer in genomic DNA and create a double-strand break. One way the cell can repair the break is non-homologous end joining. It takes the two ends and
simply puts them back together. But this is an error-prone process that often results in the
introduction of indels, which can result in frame shift mutation that prematurely truncates the protein. If you mutate both alleles, you can generate a full gene knockout. No homologous recombination is needed. No antibiotic selection is needed. Let's say you, instead,
want to knock in a mutation. Again, you designed
your guide RNA to match the desired site in the genome and introduce a double-strand break. The other way the cell
can repair the break is homology-directed repair. Along with Cas9 and the guide RNA, you provide a double-strand DNA vector, or a single-strand DNA oligonucleotide, containing your mutation,
along with homology arms to serve as the repair template. At some frequency, the
cell will incorporate the mutation into the genome. Again, no antibiotic selection or any other tricks are needed. Let's highlight a couple of
common research applications of the CRISPR-Cas9 system. It's increasingly being used to generate knock out and knock in mice. In vitro transcribed RNAs, one a messenger RNA encoding Cas9, the other the guide RNA, are injected into
single-cell mouse embryos. The intent is that mutagenesis occurs at the target site in the
genome in some of these embryos. The resulting blastocysts
are implanted into surrogate mothers, and after
three weeks, pups are born. These pups are then screened for mutations at the target site. This methodology works
with high efficiency, in some cases approaching
100% mutagenesis rate. The obvious advantages
are that knockout mice can be generated without
ever needing to use mouse embryonic stem cells, and the process is much quicker than the traditional approach
of making knockout mice. Another common application entails the use of human pluripotent stem cells, whether human embryonic stem cells or induced pluripotent stem cells, to perform disease modeling. One either starts with a
wild-type stem cell line or with an induced
pluripotent stem cell line bearing a patient-specific mutation. CRISPR-Cas9 is used to either introduce a disease-associated mutation, or to correct the
patient-specific mutation. In either case, the
result is the generation of isogenic stem cells lines that have the same genetic background,
epigenetic background, and so forth. These matched stem cells lines are then differentiated into the
cell type of interest, whether it's cardiac myocytes, endothelial cells, neurons, hepatocytes, and so forth. In principal, any phenotypic difference observed between the
differentiated cell lines can be attributed to the disease mutation. A significant advantage of
CRISPR-Cas9 is its efficiency. However, the danger of using a tool that's designed to cleve
the genome at a target site is that it might also cleve
the genome at a different site and cause so-called
off-target mutagenesis. This phenomenon could potentially confound one's experiments. In general, off-target
effects are thought to be most likely to occur
at sites in the genome with sequence similarity
to the on-target site. Accordingly, several web
servers have been developed that allow you to enter
your on-target site and search through the genome for potential off-target
sites, with a small number of mismatches to your on-target site. This can be helpful in prioritizing among several candidate guide
RNAs for a project. As you may wish to choose the guide RNA that seems to have the least potential for off-target effects. A number of variants of
the CRISPR-Cas9 system are now actively being used
in research applications. Almost all of them are derived from the naturally-occurring system found in the bacterial species
Streptococcus pyogenes. At least for now, the Strep pyogenes Cas9 and its associated gRNA architecture are the standard in the field. There has been extensive
work characterizing its on-target and off-target effects. CRISPR-Cas9 adapted from another species, Staphylococcus aureus, has
recently been introduced. One potential advantage
is that Staph aureus Cas9 is about three-quarters of the size of Strep pyogenes Cas9. Staph aureus Cas9 is just small enough to fit into an adeno-associated
virus, or AAV vector, along with the guide RNA. This makes it possible to use CRISPR-Cas9 for a variety of in vivo
genome editing applications. Initial studies suggest that
Staph aureus CRISPR-Cas9 can have similar on-target efficiency, along with less off-target effects, compared to Strep pyogenes CRISPR-Cas9. Here's one system by which to introduce Strep pyogenes Cas9 and a
guide RNA into mammalian cells. You can express them from DNA plasmids. The guide RNA can be
expressed from a plasmid with a U6 promoter, as shown here. Remember that the first 20
nucleotides of the guide RNA can be changed so as to determine the genomic DNA sequence to which the CRISPR-Cas9 complex will bind. The remainder of the guide RNA remains exactly the same. It's very easy to custom
design a guide RNA to bind to a desired DNA sequence. In this system, two complementary single-strand DNA
oligonucleotides, or oligos, are used to insert the
desired 20 nucleotides into the plasmid in such a way as to put them at the 5
prime end of the guide RNA. A single ligation reaction
is all that's needed. Conveniently, the Cas9
protein needs no alteration. The same version of
the protein can be used for targeting of any genomic DNA sequence. In the plasmid shown
here, Strep pyogenes Cas9 is expressed using a
strong promoter called CAG. The plasmid co-expresses a green
fluorescent protein or GFP, which is convenient for marking cells that are successfully expressed in Cas9. After the guide RNA plasmid is completed with a single-ligation reaction, the two plasmids can be
introduced into cells, typically, by using the techniques of transvection or electroporation. Of note, the two-plasmid system shown here is one of many dfferent
systems that are available to express Strep pyogenes
CRISPR-Cas9 in cells. Here is an analogous system by which to introduce Staph aureus Cas9 and a guide RNA into cells. This system also uses two plasmids, which are similar but not interchangeable with the two plasmids
used for Strep pyogenes that were shown on the last slide. The Staph aureus guide RNA is different from the Strep pyogenes guide RNA. One difference is that
the protospacer length for Staph aureus is 21 nucleotides, rather than 20 nucleotides. Here are some rules for designing the Strep pyogenes CRISPR guide RNA. First, the protospacer is
20 nucleotides in length, so one must choose a
protospacer of that length in genomic DNA. Second, the protospacer must be positioned just upstream of a 3-base pair element that matches the sequence NGG, which means any nucleotide
followed by two guanines. This element is known as the protospacer-adjacent motif, or PAM. The PAM is directly recognized by Cas9. Without the PAM, no complex can form. Next, the 5 prime portion of the guide RNA must match the protospacer. It is this portion that hybridizes the complementary stand of DNA, the mechanism by which
sequence recognition occurs. Of note, because you're
using the U6 promoter, there's a specific constraint. The guide RNA must start with a guanine in order for it to be transcribed. Thus, you should add a G to the beginning of the protospacer, making
it a 21-base sequence that you're placing at the 5
prime end of the guide RNA. The extra base at the very
beginning of the guide RNA does not affect binding
of the complex to DNA. Here are some suggestions
for choosing a site to target in the genome. First, it's important to note
that the double-strand break generated by Cas9 occurs three base pairs upstream of the PAM in the position indicated here by the red line. When mutations occur by
non-homologous end joining, they tend to occur
right at the break site. It's also important to realize that the CRISPR-Cas9 complex can form on either strand of double-strand DNA. You should always check for
protospacer PAM combinations on both strands in order
to find the optimal one. In general, your goal should be to choose a guide RNA that will position the double-strand break
as close as possible to the actual site at which you wish to introduce a change in the DNA sequence, whether it's an indel to knock out a gene, or a variant you're trying to knock in. When searching for well-positioned protospacer PAM combinations, you may find several good ones. You can then prioritize
among the candidates. For example, you can
profile their possible off-target bindings sites
elsewhere in the genome and choose the one that appears to be most favorable in that respect. Finally, if possible, it's best to avoid protospacers that have lots
of guanines and cytocines, or to put it another way, is GC-rich, as this has been suggested
to increase the chance of off-target effects. The rules for designing
the Staph aureus guide RNA are largely the same, with
a few critical distinctions. The protospacer is 21
nucleotides in length, rather than 20 nucleotides. The protospacer must be positioned upstream of a different PAM. The Staph aureus PAM is more complex than the Strep pyogenes PAM, with the sequence NNGRR, where R is appearing,
whether guanine or adenine. The optimal PAM is thought
to be slightly longer, with the sequence NNGRRT. As before, the 5 prime
portion of the guide RNA must match the protospacer. Because you're still
using the U6 promoter, the guide RNA must start with a guanine in order for it to be transcribed. Thus, you should add a G to the beginning of the protospacer, making it a 22-base
sequence that you're placing at the 5 prime end of the guide RNA. Here are some more general suggestions for choosing a target site in the genome for your project. If you're trying to knock out a gene, there's quite a bit of flexibility with respect to target sites, because all you need to do
is introduce a frame shift early in the coding sequence of the gene. The exact location is
usually not important. Because it's ideal to make the truncated protein product as short as possible, you'll typically want to target a sequence in the first exon that
contains coding sequence. Sometimes, however, this may not work if the gene in question has
alternative start sites, or alternative splicing of exons. It's always worth checking
in the USCS Genome Browser to see what genome transcripts
have been identified, and if there is a lot of
heterogeneity among the transcripts with differing start sites
or splicing patterns. It's best to target the
earliest coding exon that's shared by all of the transcripts. If you're trying to knock in a variant, your site selection will
be constrained by the need to place the double-strand break as close as possible to
the site of the variant, ideally less than 10 base pairs away. Keep in mind that when you're identifying the site of a mutation, particularly one that has been reported in the literature, you'll need to use the complementary DNA or cDNA sequence. That is, a coding sequence in which all of the introns have
been removed to do this. However, when you're
designing the guide RNA, you'll need to use the genomic sequence surrounding the site. If you use the cDNA sequence, there's a chance that your site is near an exon/intron junction,
and your protospacer may inadvertently span across two exons. Of course, this guide RNA will
fail to bind to the genome, since it doesn't take into account the presence of an intron in
the midst of the sequence. Let's now consider an
example of CRISPR design. Imagine that we're trying to make a cellular model of the
cholesterol disorder known as familial combined hypolipidemia. The responsible gene is ANGPTL3, with loss of function mutations
resulting in the disorder. The most commonly found mutation is the S17X nonsense mutation. Here's our task: To design a Strep pyogenes guide RNA that will let us target
the site of this mutation in a wild type cell. This will potentially
allow us to do two things. It will let us try to
knock in the specific S17X mutation into the genome. However, because the site is very close to the beginning of the coding sequence, we could also use this guide RNA to try to simply knock out the gene by introducing frame shift mutations. Here's the start of the
coding sequence of ANGPTL3. This sequence is taken from
the human genome sequence, so we don't have to worry about missing exon/intron junctions. Highlighted in red is the site of the S17X dinucleotide mutation, which changes a TCC codon into a TGA stop codon. To find a suitable protospacer, we must first look for Strep pyogenes PAMs matching the sequence NGG. If you look in the vicinity
of the mutation here, you'll see that there's no nearby NGG. However, remember that you can design guide RNAs that match
either strand of DNA. So we can also look for PAMs
matching the sequence CCN, which corresponds to NGG
on the opposite strand. Here we find three CCN sequences near the desired mutation site. Let's consider each of
these PAMs one by one. For the first one, because we're now working off the opposite DNA strand, the protospacer will extend
in the downsteam direction. The 20-base protospacer,
once you've determined the reverse complement sequence, is shown. If you map the location of
the double-strand break, it'll be three base
pairs away from the PAM, as indicated here by the red line. The break will occur 10 base pairs away from the site of the mutation, which is okay, but not optimal. The protospacer is not GC-rich, so that's an advantage. There's another important consideration when choosing the protospacer, and that's whether the
protospacer and/or PAM overlap the site of the mutation. In the example shown
here, the mutation site falls within the protospacer, which is an advantage. Why is this an advantage? Consider the following scenario where the protospacer and PAM do not overlap the site of the mutation. CRISPR-Cas9 introduces
a double-strand break. The desired knock-in
mutation is successfully introduced into the genome by homology-directed repair. Because the protospacer and PAM have not been changed in
the knock-in mutant allele the guide RNA is still a perfect match for the genomic sequence, and CRISPR-Cas9 can go back and re-cleave the same DNA. If an indel then occurs via
non-homologous end joining, then the knock-in mutant
allele will be disrupted. It's possible that the
experiment will ultimately yield no clean knock-in alleles. This scenario can be avoided,
or at least mitigated, if the protospacer or PAM is altered by the knock-in mutation, resulting in a sequence mis-match. Then it becomes less likely
that re-cleavage will occur. It's worth noting that single
or even double mismatches may not eliminate re-cleavage, especially if the mismatches occur near the end of the protospacer that's far away from the PAM. Mismatches near the PAM tend to have more of an inhibitory effect. Disruption of the PAM itself, so that it no longer
matches the sequence NGG, will almost certainly
eliminate re-cleavage. Here's the second possible protospacer. The break will now
occur just one base pair away from the mutation site. The protospacer is not GC-rich, which is an advantage. The mutation site falls
within the protospacer, which is an advantage. Here's the third possible protospacer. The break will occur a little further away than the last one, four base pairs away. The protospacer is not GC-rich, which is an advantage. While the protospacer would not
be affected by the mutation, the PAM itself would be, and that would essentially eliminate the possibility of re-cleavage, which is an advantage. On paper, the second protospacer appears to be the optimal one, since it results in cleavage very close to the mutation site. However, the best way to chose
among the three candidates may be to actually test
them in human cells to empirically assess which has the highest on-target efficiency in vitro. If the second protospacer
shows much less activity than the first and/or third protospacer, then it may not be the
best choice after all. Let's assume that the second protospacer turns out to be the best choice. The next step is to design
all of the nucleotides that we can use to place
the protospacer sequence into the plasmid that will express the guide RNA and cells. Recall that we have to
add an extra guanine, shown in red, to the beginning of the protospacer sequence, shown in blue. We can use these template oligos to design the oligos that
will specifically target the site of the ANGPTL3 S17X mutation. Note that the templates shown here are displayed in such a way as to convey how they'll hybridize to form a small double-strand DNA insert that can be ligated into the vector. When the desired protospacer
is encoded into the oligos, you see the result at
the bottom of the slide. At this point, we can
simply purchase these oligos from a vendor. If we were simply trying
to knock out the gene, we'd be done, since all we'd need are the guide RNA plasmid, which we can now complete
with a single ligation step, and the fixed Cas9 plasmid
that can be used as is. However, if we wanted to knock in the specific S17X mutation, we'd need a repair template as well, ideally a single-strand
DNA oligonucleotide. It's worth emphasizing
that knocking in a mutation relies on homology-directed repair. However, even in the best-case scenario, non-homologous end joining will occur in parallel with homology-directed repair. So even if you're adding
a single-strand DNA oligo as a repair template, you'll like obtain a mix of cells, some with the S17X mutation, and others with indels at the
site of the desired mutation. There's not yet a standard method to enrich for the first
type, which we want, and prevent the second
type from occurring, which we may or may not want. Although such methods
are under development and are starting to be
reported in the literature. To design the oligo repair template, you can simply take the desired mutation and flank it with at least 40 nucleotides of homology on both sides, taken directly from the genomic sequence. In this example, the
dinucleotide S17X mutation is embedded in the middle of
a single strand DNA oligo, with 40 nucleotides of
homology on either side. We can simply purchase
this oligo from a vendor. In practice, we'd probably choose to use even longer regions of homology, as it's feasible to obtain oligos that are up to 200 nucleotides in length, and there's data to suggest that longer homology arms will
increase the efficiency of homology-directed repair. The final step is to develop a method by which to screen for mutations introduced at the target site. The most straightforward way to do this is to design PCR primers that will amplify a region surrounding the
target site in the genome, ideally with the target site located in the middle of the amplicon. The amplified PCR product can be used to assess the overall mutagenesis rate through the use of assays
that detect mismatches among the DNA sequences
present in the PCR product. The PCR product can also be subjected to Sanger sequencing, or
next-generation sequencing, in order to identify
the specific mutations introduced at the target site.