Genome Engineering Workshop - Day 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay we're gonna go ahead and get started here and try and stay on schedule as one always optimistically hopes at the start of the meeting so welcome to the 2018 genome engineering workshop hosted by the Jung lab my name is Rhiannon I'm a member of the Jung lab and one of the organizers for the meeting and we are delighted to have you guys all here and our real goal is basically to teach you as much as we can in the next day and a half and answer as many questions as you can so you may have seen this slide floating past most of the people from this young lab are clearly demarcated in our gray shirts so if you have questions find someone in a gray t-shirt and we will do our best to get your questions answered so the format for today we're gonna have three talks from people from the Jung lab and then we'll have breakout sessions after that where you guys can go and kind of get one-on-one help and advice on your experiments and then we'll have a keynote speak speaker Eugene Koonin this afternoon and then we'll have poster session with beer and snacks so you should definitely try and stay all afternoon if you can I know it's Mother's Day I'm a mother I appreciate anyone that came even if they maybe had family obligations and will definitely try and make it worth your while so with that I'd like to introduce our first speaker lynnie is a graduate student in the lab he's going to talk about just the basics of CRISPR and kind of introduce you guys to all of this and get things started and I can see he's wearing his fancy shoes today so I think you're gonna get a really good talk thank you thank you very much Rhiannon and thank you all so much for coming if you're coming from out of town I welcome to Cambridge and very happy Mother's Day to all of you so as Rhiannon mentioned I'm a graduate student in the lab and today it's my privilege to talk to you about some of the work we've been doing on CRISPR caste systems and in particular genome editing with a cast nine bit of an echo can everyone hear me okay in the back yeah okay great thank you so today I'd like to talk first a little bit about just the basics of using CRISPR cassadine for genome editing in the lab and then highlight a little bit of the frontiers and current development of CRISPR technology so the story starts with the sequencing of the human genome almost 20 years ago now and since then the cost of sequencing genomes has really exponentially decreased especially with the advent of next-generation sequencing technologies in the past 10 years or so and with that we've accumulated a wealth of data about the genomes of all organisms on all kingdoms of life in a particular about our own genome and in the process we've gained a lot of insights into disease processes such as some of these are shown on the slide here but there's also much to be learned still even though we're very good at reading we don't necessarily understand everything that's encoded in the genome and what that means and its implication for biology in complement to a lot of these genome sequencing projects have also come a lot of detailed insights from genome-wide Association studies to map out correlations between different kinds of variables and I think what is still a kind of an open question and still something to be solved is the club the question of causality what genes actually cause about kind of phenotypes and how do we systematically interrogate that and I think to do that required the development of a new technology that we can actually use not only to read genomes but to mean to manipulate them in a targeted way so the idea of doing genome editing actually is a quite an old idea over maybe about 30 years or so which is that if you create a double-stranded break somewhere in the genome the host enzymes that are in the cell repair this double-stranded break in such a way that really fits facilitates genome editing so if we start out with a break somewhere in the genome typically one of two things can happen once that that cell recognizes the break so in the first pathway it's called non-homologous end joining which the the cell basically uses to recognize this break and go into a sort of panic mode and tries to stitch together the break in any way it can to prevent it from dying and in the process it creates has a propensity to create these insertion deletion mutations really small like you know 1 to 20 base pairs or so right at the break site and if that happens to happen in a protein coding gene you can knock out the gene and inactivate it the second pathway which is typically less prevalent and less favored in cells is a homology directed repair in this case in addition to the break the user also supplies a template sequence which matches both sides of the break but the middle here shown in blue is different and with some efficiency the cell can actually recognize that and use that as a set of instructions to repair what happened in the break according to what was there in template so that's a way to create more precise genome edits than non-homologous end joining so the question is how can we create this double-stranded break in the cell we know that's very useful but the genome is 3 billion bases long and it's kind of like finding a needle in the haystack so in the early days people were using a designer proteins so these are our proteins that actually have protein DNA specific recognition to be able to target specific sequences in the genome these were very expensive as of 2012 maybe 5,000 dollars to create a novel one for some custom designer nucleus early generations one included zinc finger nucleases that recognized kind of three bases at a time based on protein recognition and a bit later on talons which have a one-to-one and a base correspondence with these two hyper variable residues encoded in a modular protein architecture so this is extremely useful and enabling but it also is a little bit restricting because for every new target site you need to engineer a totally new protein so here's where CRISPR comes in which is an RNA guided DNA recognition module that's actually an anti phage system in bacteria it's a bacterial immune system that actually stores a part of the viral DNA that that's from a previous infection and then uses that as a reminder for future infections to actually cleave the viral genome so so this is kind of an actual image of it in a cartoon for how CRISPR works and at the heart of CRISPR is protein RNA complex called cast iron here shown in this cartoon and Chi sign is an RNA guided nucleus that actually targets the phage genome so in the recent years there's emerged a pretty comprehensive categorization different kinds of crisper systems out there and there's roughly kind of a one watershed between the different kinds of crispers one in which kind of the the effector function to actually target the phage genome and split over multiple proteins this is class 1 and class 2 which constitutes cast iron in some other proteins has a single large effector protein that's a lot more amenable for adaptation and harnessing for genome editing because we don't have to worry about all these proteins coming together just a single one so I'm going to talk for the rest of talk on on class 2 systems so as I mentioned the most well-known of these proteins is cast 9 so this is an RNA guided programmable protein that cuts DNA this is the target DNA strand and cast 9 is this protein in yellow and it binds to a piece of guide RNA that consists of a variable region that has actually complimentary to the target site that we want to edit and a constant region that's required for activation of the caps 9 nuclease so the guide RNA invades and unwinds the target and tries to look for sites where it's perfectly complimentary and when it finds a site like that and it also has the secondary requirement that this sequence right next to it called the proto spacer adjacent motif or the pam in this case to guanine is also there so if we have complementarity and we have the Pam and cassadine unwinds the DNA with the guide and makes a cut in the DNA so we can create targeted double-stranded breaks this way so of course the the main advantage of cast 9 is we can keep the protein the same for all experiments and only change the guide RNA which is much easier than engineering a totally new protein to do engineering so that's the proto space your adjacent motif so here is just a quick illustration of how that works in the cell so here cast 9 has been put into the nucleus and it finds the pam sequence and it uses a guide RNA here shown in red to unwind the target DNA and if there is perfect base pairing as in this case where all the bases match then cast iron actually creates a double-stranded break in the DNA that break is then recognized by host factors and there's some shuttling and lots of things going on here but at the end there might be some mutations at this target site right near the break shown in red which then can activate the gene of interest so so the cleavage activity of cast nine is not the only function turns out cast iron has two nucleus domains and two key active site residues in each of these domains and if we actually inactivate them with alanine substitution mutations this can be turned into something that only binds but doesn't cut DNA which is an alternative and useful platform for recruiting stuff to DNA but not actually cutting it so that's kind of the basis for for CRISPR based genome engineering using cast 9 as a nucleus of some swen for genome editing and as well as half - just a DNA binding programmable module without actually cutting and some other very interesting developments more recently that don't fall in these categories which I'll touch a bit on later so in the past couple of years this has enabled a lot of applications in many domains of biological science from engineering organisms and crops to being able to allow us to systematically interrogate the consequences of genetic variations in research so when I spend the rest of the talk first on kind of outlining the prototypical CRISPR experiment and some of the design considerations Matt put into that and some selected frontiers in more recent years in how the CRISPR toolbox has been expanding so this is a prototypical workflow for for generating a model cell line either a knock out or knock in using using CRISPR cassadine so we first start with designing and constructing and thinking about the reagents and the guides that we want to deliver constructing the reagents and thinking about the delivery method whether it's a plasmid or a virus or the protein itself and then actually doing the CRISPR experiment and functionally validating and obtaining a cell line of interest so I want to first talk about the design of this experiment and some of the things we might think about for design go CRISPR knockout so arguably here are five of the main strategies we can use to modify genomic sequences with these differ by the number of guides that we deliver as well as the pathway in the cell that that's responsible for affecting the change in addition whether or not we supply a donor sequence so in the first case we actually don't create any double-stranded breaks this is a new technology developed by a David loos lab and others around the world where we actually directly change a single base from either a C to a T or an A to a G so delivering one guide called base editing and it doesn't use a double-stranded break three different strategies with a single double-stranded break and a single guide either knocking out a gene by just creating a cut in letting nhej a take over to get a mixture of insertion and deletion mutations are doing the same thing except supplying a donor to try to specify the kind of mutation profile that that occurs and then using nhej a with a large donor that actually gets ligated into the break site to insert a large cargo into a specified break site and then finally using two guides to delete a segment of the genome in a programmed way so there's some considerations for actually designing the guide sequences or guide sequence for each of these strategies some general considerations are the cleavage efficiency in the off target activity which is applicable for all of these strategies and i also want to talk about some special considerations for choosing guides for each one of these strategies so there's a lot of great work done here and elsewhere to understand what makes a good guide what makes it efficient and what makes it specific and a lot of that has been caught off codified in a lot of online software and there's many software available that's it's highly valuable I want to highlight one of them that I'm particularly fond of which is a tool in benchlink it's an integrated on target and off target prediction tool which kind of integrates the the state-of-the-art on target and off target scoring in an integrated workflow you can have it automated and in automated fashion Pig guides for you and tell you kind of what the scores are and where it is in the target site of interest I think one thing that's important to note is even though these are very carefully thought out these scores are really approximate and they're not a definitive kind of measure of how the guides actually work so you can oftentimes find guides with lower on target scores that are actually more efficient when you test them out in the lab than things with high on target scores and vice versa so it it pays to kind of look broadly and just create a list of things that you might try out as I mentioned there's also some more specific considerations for each of these strategies in terms of what makes a good guide so I want to talk about just knocking out a single gene with a single guide by indents so it turns out that not only the efficiency of the guide is important but also what kinds of in Della's you get after you deliver the guide is also really important so if you're trying to knock out a gene you're trying to create a lots of frameshift mutations instead of inferring mutations so picking a guide that will give you frameshift mutations for example might be something that's really important so a lot of work in recent years has highlighted that even though you create a double-stranded break in the genome the kinds of repair that you get is really non-random and it really depends on the sequences that are there right next to the break site so here is a particularly cast 9 target with the ends indicating the variable regions and the red triangle indicating where the break is happening and I'm in a number of these bases just for future reference starting from the 3 prime end to the 5 prime end so base 1 to base 20 and the highlighted base here is the fourth base shown in red so one of the observations that we and others have made is that the kinds of mutations you get actually depend on what base is there at base 4 in the target site so here's some data showing that so so this is a bit of a busy plot but to take you through this on the x-axis is the percent in dal so the efficiency overall of up getting cuts in the genome on the y-axis is how many insertions are among the in doubles how many of them are actually single base pair insertions so this would all be kind of frameshift levels and each dot is a different guide sequence in the human genome and this is stratified by the identity of the base at position 4 so if we have a D or C up for most of the guides don't have a very strong level of these these single base pair insertions but kind of an interesting observation is that if you have an A or T at the fourth base then then the percentage of the probability of getting off single base pair insertion is much higher so it just might be like a slight design consideration that if you're trying to get a lot of frameshift mutations if you pick an A or T at this base it can help the second consideration is micro homologies that are present at the target site if you have bases that are very similar on each side of the target site there's a very strong tendency to collapse those microfoam ologies into a single one so here for example the break is shown in green and there's two micro colleges TCA on both sides with probability much higher than random a lot of that pair outcomes will actually just be deleting everything in between the tcas and getting a single TCA so if these TCAs are positioned in such a way that it gives you an out of frame mutation once you collapse the the micro homology then that can be used to your advantage and there's a great prediction tool at this website that you can actually use to predict if there are strong magret homologies or not near the cut side the second consideration for a different strategy so instead of using a single guide to get a mixture of in dolls to use that one guide and also a donor template to create out of what the fine cell line there's also a consideration where the distance of the guide and on the cut side to the to your intended mutation of interest actually plays a big role in how efficient things turn out to be so here is a paper in in 2016 a great work demonstrating the effective distance of to the cut site to HR efficiency and there are three categories here either to get double mutants so homozygous lines a wild-type line or a heterozygous line and there's kind of a different optimal distance for each case so if you're trying to get a homozygous kind of knock out line or knock in line then it's best to choose the cut site to be as close as possible to the intended site of mutation but if you're trying to get a heterozygous one it's actually maybe a little better to me farther away because if your guidance too efficient then maybe there are too many homozygous clones that you obtain and there's kind of a trade-off there so maybe moving it out about like 10 12 bases or so can help obtaining a heterozygous line so to summarize all of that I think for general considerations online tools like benchlink are really good at providing general guidelines for what's a good guide in terms of on target and off target activity and there are also strategies specific considerations for knockouts checking for micro phonologies or a or t at the fourth base and for HDR picking guides with a good distance the site you want to mutate okay so this is kind of the roadmap for the first part of the talk and they move on to the second point which is constructing reagents and thinking about how to deliver CRISPR into the cell line or animal model of interest so these are arguably three of the three most popular ways to deliver caste 9 into cells so either using plasmid based vectors or lentivirus or a V so lentivirus will give you an astable integration into the genome and a more or less random location AV is really good for in vivo delivery for a live animal and then protein delivery also is very good for having very low off target effects and low toxicity for plasma delivery there are many many reagents available on a gene which is a great resource this is a highlight of some of the more popular reagents from our lab the px 330 series is just expressing cache line and guide on a single backbone for five eight and four five nine our extensions of px 330 were in addition to cast line and guide it either has a GFP or a pure myosin resistance selection marker there's also a lentiviral vector to package lentivirus with cast iron and guide and then an additional vector that has a different species of calf line which is a bit smaller and better packaged into a V and a gene has great resource pages to to read more about on these so after delivering the reagents and obtaining knockouts or napkins we want to assess how well the editing actually happened so this is the functional validation and further characterization so it's my knowledge there are three main ways to assess how much in doll you're getting from a crisper experiment three ways are using gel electrophoresis surveyor or the t7 endonuclease one si the second method is using Sanger sequencing method called pide which I'll talk about in a bit and the third in our favorite one since it's probably the most sensitive using deep sequencing to actually sequence the kinds of mutations that happen that the locus of interest so I'm going to just briefly go through each one of these right now so Surveyor is a method to quantify in dolls using gel electrophoresis so we start off with a mixture of genomes some of which contain mutations and some of which don't and we want to figure out what percentage of the genomes actually contain mutations so here's an IND all here and some Apple cons with no window so if we amplify this entire mixture we get a mixture of applicants and then if we melt them actually real nail them to each other instead of the red one and the red one reah nailing we actually get some percentage of the red one and the black one really and so forth when we incubate this with the Surveyor nucleus or the t7 endo nucleus one it actually recognizes bulges like this where it's there's not a perfect base pairing and it makes a cut wherever it forms this sort of bulge so this is a way to distinguish between stuff that really and stuff that really with some bulge in it so once the nucleus cuts we actually get two fragments if there was a bulge there then there's two fragments exactly where the CRISPR editing site happened if we run that on the gel we can see the fragments at defined locations relative to the uncut band and by quantifying the intensity of the fragments this is a good way to assess editing efficiency the second method is tired or tracking of indels by decomposition this is a similar idea as a surveyor except we're not using the surveyor nuclease instead we're using Sanger sequencing to be convolute what's edited and what's not so if we start out with a mixture of products some of which have edits and some of which don't and we just sang your sequence the entire mixture if there are no edits then it looks clean all the way through if there are edits it looks clean up to a certain point and then we hit the editing region where there's insertions and deletions and stuff just gets super messy what's really cool is that by using a sophisticated computer algorithm we can be convolute the types of sequences that actually are responsible for making up this mixture and kind of get an estimate of the different kinds of sequences that are actually in the mix population the third strategy is using deep sequencing so here we take the entire region PCR the mixture and then actually just sequence a single molecule and then count how many amplicons and the sequencing reads have in dolls and how many don't so what do we use what's the best thing to use I think there are advantages and disadvantages to each one so these are definitely much faster and probably more amenable if you have a small number of samples so Surveyor and tired typically give maybe a 1 to 2% level of sensitivity Surveyor tends to underestimate the types of indels because the types of mutations they are not totally random so you don't always get cleavage even if even if there's a mutation there I think probably if you have access to deep sequencing we've found that it's the most sensitive strategy the downside is that if I was pretty high throughput so it might be hard to get kind of a single sample into a my secret to quantify that way for general just assessing overall how how well things worked I think surveyor and tired work pretty well for a small number of samples ok so now I want to talk a little bit about some frontiers in recent years in expanding the genome editing toolbox the things other than in Chris Burke asinine so the first application is actually a little bit old back in 2014 where our lab created a cast 9 Mouse so this is a mouse that actually just has cast 9 in there already and then if you just deliver the - that Mouse it can start creating edits by itself so here is shown kind of a Cassadine Mouse where there's one guy delivered and you can see accumulation of edits at this particular locus and also na kind of phenotypic validation so this is a way to accelerate disease modeling inside a live animal first of all also enables high-throughput screening so not just targeting one guy at a time but doing it on a genome-wide basis here's one example of that so this is a patient with with with melanoma that's been treated so that here you can see some tumors all over the skin and after 15 weeks of drug treatment get a profound phenotypic reversal where most of these mutations or most of these tumors kind of go away but unfortunately after a few more weeks of treatment there's this relapse where tumors start coming back again and the molecular mechanism underlying this process some of which is still unclear so one thing that CRISPR can do to enable understanding of a system like this is to systematically interrogate every gene in the genome to ask what might be responsible for causing this kind of resistance so here we can actually use algo array synthesis which is fairly inexpensive to create a library of guide RNAs that target every gene in the genome and clone that into a pooled library of lentiviruses which are then used to infect these cells and and kind of screen in a pooled way what's giving rise to resistance and what's not so here the guide RNA acts both as a barcode for what James got edited as well as the vehicle for actually editing the gene so here's that same experiment done in a cell line model where we're screening for what kind of genes are giving resistance to that drug and we find that there's these collection of genes up here out of all these genes genome-wide that are screened that are highly enriched and possibly confer some some pathway modifications that that confer resistance so Julia's talk will be next and she'll talk about I'm all sorts of different screening processes and both activation and knockout and all the different variants that are possible today one set of recent developments is also not only just using wild-type cast line that came directly from bacteria but also engine that engineered cast 9 variants that have different properties that allow us to target DNA differently so one theme of research has been engineering the cache line protein itself so that's inherently more specific at targeting DNA to the best of my knowledge there are currently seven different high specificity casts nine variants out there that are very great work and very useful so starting with one of the first two that were published in 2016 ESP cast iron and SP cast 9h f1 and then more recently additional variants that that also have fantastic specificity as before all of these variants are available on add gene and I think yes so these are really good ways to if you're concerned about off target effects to reduce off target effects it's also been our experience that for at least some targets these at least some of the variants have a tendency to lose a little bit of activity so one consideration for for using a high specificity cassadine is that you might want to test a couple more target sites to make sure you're getting a guide that actually works really well the second consideration is that usually the these cast lines with the exception of stepper Cosman only work well with a fully matched 20 base pair guide sequence so if you're trying to introduce kind of a mismatch at the 5-prime end to use with the u6 promoter and that doesn't really work very well with with a lot of these and especially these first three up here so using a fully matched 20 base pair guide is is essential cast enzymes has have also been engineered to be a little bit better at recognizing different DNA sequences in terms of broadening the pam recognition motif so here are three two variants of SP cast nine one variant of sa cast nine and two variants of another enzyme CPF one which we'll talk about in a bit then can recognize Pam sequences that were not originally able to be recognized in the bacteria so this great work from dr. Keith Jung's group at MGH but you might hear a little bit about tomorrow engineering variants of SP cast nine to recognize not only ngg but nga and ng CG Pam's an SI cast nine to recognize NNN RIT Pam so these significantly expand the number of target sites that are able to be hit in the genome where recently we've also done some work on engineering the specificity of another enzyme CP f1 which I'll talk about to recognize this TT CV and TC CV Pam's and T ITV Pam's in addition to the the triple-t that recognizes normally so as I mentioned earlier one really exciting new development is the ability to edit single bases in the genome without actually creating a double-stranded break so this is we're highlighting slide highlighting some work from dr. David Lou's lab here at the broad using an engineered cast nine Niki's fusion to a actually adenosine or cited in deaminase domain and this actually unwinds the DNA and instead of making a cut it uses the deaminase to actually pull off an amine group from either a or or or C and that converts that base into a G or a T so to make point mutations which is sometimes a large effort this can significantly accelerate the time and the efficiency of doing an experiment like that okay so finally I want to highlight some work on expanding the CRISPR toolbox to not just cast nine but other kinds of CRISPR systems that are out there so in the last couple of years we've really come to the understanding that CRISPR systems are really diverse it's not just one cast line but many cast ions and it's not just cast iron but many other kinds of enzymes that that are more diverse than that so there's been a consensus that there's now three kinds of class two CRISPR systems two target DNA and one that targets rnas so the two DNA targeting systems one of which includes a large family of cast nine enzymes and then a second one that includes cast 12 one of which is CPF one which has been harnessed for genome editing in addition to the two DNA targeting systems there's a third system that actually targets and Cleaves RNA instead of DNA and Max will talk a little bit about that later this afternoon so I want to talk a little bit about some work on CPF one so CPF one or cass 12 a is an alternative editing enzyme relative to caste 9 so here is the genetic locus in the bacteria for forecast 9 versus for CPF 1 and you can see some major differences already there's a difference in the architecture of the protein itself as well as in the architecture of the neighboring genes and the RNA around and this can be used to advantage for genome editing here is a comparison between caste 9 and CPF 1 so cos 9 recognizes an ngg Pam right after the target site and it contains 2 nucleus domains and a relatively long piece of RNA the guide RNA in contrast of that CPF one actually has a Pam on the other side of the target site on the five prime end and instead of being a ji rich Pam it has a T rich Pam so the three T's instead of two geez and what's interesting is that because it doesn't require the tracer RNA the guide RNA is much shorter and because of that it's a lot easier to deliver guides with CPF 1 in addition the cache line creates a blunt end cut where CPF 1 creates a staggered cut with a three to five base pair overhang so one of the advantages of CPF 1 is actually being able to take advantage of the fact that it has a much shorter piece of RNA so here's cast 9 this is the pre crispr RNA that's transcribed and it's hybridized with a bunch of copies of the tracer RNA and that's required for getting this free crispr RNA processed into individual mature crispr RNA is that can actually target particular target set in the DNA this requires cast nine and it requires the host factor RNAs three in contrast with CPF one this is a lot more simplified because we don't have this tracer RNA requirement and it turns out that if you just put the crispr RNA array with CPF one CPF one itself can just cleave this into individual guides so this is cool but why is that useful so this is useful because it allows us to target multiple genes at the same time in an efficient manner without delivering guides all at the same time so here's an experiment that Ferrand and Matthias and others in the lab have done to demonstrate the utility of this this feature so here is an editing experiment where we have the expression of CPF one in human cells as well as an array of guides right before the CPF one and this is an array of four guides that targets four different genes in the human genome and it's delivered in one of four different formats either as single guides with each guide driven by its own promoter or as just a big array driven by a single promoter at the very beginning and there's three configurations of the array depending on the length of these direct repeats and the guides and the takeaway that's very interesting is that with this shortest configuration array one targeting the targeting efficiency of each of the four genes is basically the same as if we deliver each guide separately with its own music sperm ohter so this is a much easier configuration to program and to clone and to make sure no recombination happens so just by delivering this one array we can simultaneously target for different genes in addition to CPF one the alternative DNA targeting system there's also a third class to CRISPR system the RNA targeting system caste 13a and caste 13b so one of the applications that we can build off of targeting RNA instead of DNA is actually to edit the RNA instead of to edit the DNA so this is version of RNA based editing that our lab developed recently where we can take a in the RNA adenosine which has this amine group and if we can convert this a into if we can basically remove this this amine group and replace it with a hydroxy this egg gets converted into inosine which was recognized as guanosine in the cell so if we can convert a to I that's effectively an A to G edit in the RNA so how do we do that we can use a caste 13 here as a module to target the RNA and then fuse on an adenosine deaminase domain to the castor team that will then recognize the base that's kind of programmed to recognize the a and then deaminate that into an i which is then recognized as a G so how efficient does that happen so these are different genes in which this editing has been attempted and you can see very efficient editing of targeting guys up to about 30% and max we'll talk a lot about this later on in the afternoon so I want to end on on one thought which is that there's been a sort of continued search for for technologies and we've learned a lot from the natural diversity of CRISPR systems that are already out there we've learned that there's two different kinds of very diverse families of DNA targeting DNA cleaving enzymes in CRISPR as well as one RNA targeting system but in the last you know 10 or 15 years we've actually got the ability to sequence a lot more microbial genomes than than what was available kind of when CRISPR was beginning to be discovered so even in the last four or five years we've seen in terms of the number of sequence microbes kind of this exponential increase and most of it has been in the last couple of years so I think a really exciting question that we're wondering about is if we can discover things as powerful as CRISPR and microbial genomes could there be other systems out there at that might confer maybe different kinds of functionality that might also be very powerful biotechnology and so there might be opportunities for of sequence mining and discovering of new protein olfactory systems in these genomes so I'd like to thank you all very much and I would like to thank everyone in the Jung lab and be happy to take any questions [Applause] yeah that's a great question yes so CPF one edits also create in Dells at the break set that can be quantified with either either of those methods or any any of those three methods one important feature to keep in mind the CPF one is that it almost exclusively creates deletions instead of insertions so cassadine you oftentimes get a one base insertion on CPF one you almost never get that so maybe ninety-eight percent of that the repair outcomes will be deletions that doesn't affect tied or NGS very much it actually has a side effect of making surveyor a little bit more accurate than forecast ions with cast iron it has a tendency to maybe underestimate a little bit with CPF 1 it tends to be a better reporter for efficiency so the optimal stoichiometry between the guide in the class night yeah so I think at least the protocols that that we are accustomed to following typically maybe give a little bit of excess of guide relatively cache line just to make sure that complex is kind of fully formed but I think as long as there's a good stoichiometric stoichiometry where you're pretty sure that you're getting yeah so geometry where you're pretty sure you're getting a lot of the active Cassadine guy complex forming I think it's a I think the experiment should should should should be pretty robust to that yeah we found that the u6 promoter works pretty well there's alternative promoters that actually have a tRNA promoter ooh that's so instead of a u6 promoter you can put a tRNA in front of the guide and that kind of gets processed in a way that maybe it's a more versatile because it doesn't have the requirement of the G so but they you six promoter you want to you're want your guide to start with a G but with a tRNA promoter you can get away with guides that don't start with that and not have to change it to a G but I think for a lot of purposes the u6 is kind of the gold standard and it works really well curious whether any study has been done so for a specific site in the genome if there are two copies of the cat's-eye in genome how many molecules of the CRISPR case we would need to give to the cells so that they efficiently scan throughout the genome and identify the site so how many how many molecules of cast and you need in any individual cell you know I don't have a answer for that I think maybe the best answer I have is for protocols that deliver reiben clear protein complex directly to the cells and typically just kind of measuring the concentration of overall protein that's delivery to the well and if you divide that by the number of cells and kind of estimate the delivery efficiency that might be a good estimate for how many molecules are actually getting in so yeah I guess I don't have a hard number I can how about the top my head right now great talk when you were explaining the position effect in the guide RNA you show whining as the 20th nucleotide is there any specific reason don't you put that in yeah I think that's really only applicable for when you're delivering an expression cassette that drives expression of the guide rather than the guy directly so it turns out the u6 promoter basically requires a G at the at the very beginning to begin transcription and if you don't have that G there I'd either doesn't transcribe well at all or maybe it starts transcription at a site that's a little bit different and so your guide might get artificially truncated and so it's just because of the u6 promoter what the effect of chromatin packing the density of chromatin within the nucleus on efficiency of the targeting for a euchromatin versus heterochromatin yeah that's a that's a great question I recalled there was a fantastic paper by dr. Jonathan Weitzman's group where they found that nucleus ohms actually impede CRISPR Cassadine activity significantly and so yeah I think if the if the chromatin is highly compact and there's just lots of stuff there then then cache line is not as good as finding and unwinding it and I think that's consistent with what we've seen for for regions of open chromatin kind of the editing efficiency definitely is higher so I guess if you have a choice of target sites and one is an overwrite where some nucleosome is sitting and other another one is maybe farther away then it's probably good to take that into account too we're gonna move on now in the interest of time but Lin you will be here during one of the breakout sessions and available to answer lots more questions so our next speaker is Julia she's a graduate student in the lab so fourth-year graduate student who is our master of screens and she will do about a half hour overview on using CRISPR for large-scale screens thank you for the introduction thanks everyone again for coming out here for our annual genome engineering workshop today I'd like to give a brief overview of how to setup a gene scale christopher screen most of this is in that protocol that we published last year so I won't go into the details instead I'll go give a overarching I guess talk on like how-to just the main points that you need to think about when setting up a Christos screen so to begin the four genetic screens are very powerful tools for mapping genotypes to phenotypes traditionally this was done by chemically militarizing flies picking out the Flies with interesting phenotypes such as white eyes or mutations in the antennas and going back and trying to figure out which genes which is essential development of genes actually contributed to the phenotype of interest now recently with the development of shrnas this process has become much quicker and much more easy to do the way shrnas work is that the shrna is processed into si RNA and can knock down base pair complimentary mRNAs now in the screening format by sequencing the the shrna we can map the targets of the search RNA and figure out which shrnas contributed to the phenotype of interest and so in the way big thanks to a large cell next generation sequencing shrnas screens have been very useful in discovering a lot of biological processes the issue with shrnas mainly is that the using shrnas it can be very very unpredictable a lot of the times shrnas will have a lot of off target sites and it tends this kind of thing tends to confound the screening results so more recently the CRISPR cast 9 knock out as Ling e talked about earlier has been shown to very robustly knock out a target mRNAs by by basically introducing a premature stop codon leading to the depletion of the target mRNA via analysis media degradation and decreasing the amount of functional protein the cell what one can do because the cast cast knockout is much more reliable and has lower off targets than shrnas the screening results from cursor casting screens tend to be a much cleaner crisper has also been shown to be used as a robust DNA binder by inactivating the two residues that are involved in DNA cleavage the cosine we can we can then add a bunch of Ian's activation or repression domains to make that cast 9 a robust activator or repressor respectively in this case I'm showing the dead cast night activator developed by our lab that has modified guide RNAs with MS two hairpins these MS two hairpins then bind to additional activation domains PCs five and eight Rosoff one and together there's three activation domains on this deck has nine activator that makes a much more robust than a previous version involving only dead cosine and VP 64 so if you target the die cast line to the promoter sites of target genes then this can lead to a very robust transcriptional activation leading to an increase in mRNA production and as a result the protein of interest the levels of the protein of interest also increased so together both Cassadine knockout de Castille activation and that case I repression has been used for a lot of different genetic streams some of these examples includes studying drug and toxin resistance genius sexuality more recently cancer immunotherapy and non-coding elements in essence the CRISPR cast line screen screens can be used to study any biological process with a screen above phenotype and now go into detail on what exactly we define the screen above finger type so there are three main types of screening selections there's the positive selection negative selection and the marker gene selection if we think about the blue guide RNA as a guy that makes the makes a change that gives us a screens phenotype of interest in a positive selection screen the cells with a blue guide RNA will proliferate and the blue guide RNA will amplify with because because this is genomically integrated the blue guide RNA will amplify with the proliferation of the cell and at the end of the screen when we sequence the guide RNAs in the pool this will we will see more of the blue guide RNA in the negative selection screening phenotype the blue guys are nice are depleted and at the end of the screen we don't see we see fewer blue guide RNAs that we started out with and then finally the marker gene selection in the marker gene selection you either have a reporter line where the screening phenotype is marked by fluorescence or you can stain with an antibody and followed by also fact sorting to identify the cells with a the phenotype that you want and the in this case the blue guide RNA will generate a cell that is green the marker gene selection screens are much more versatile than the positive and negative selection screens by the same time they're also a little bit more difficult to set up because yeah it requires a lot of facts wording so overall the workflow of this crisper cutting screens starts out with making this library for whatever purpose that you want I'll go into the detail of this in a little bit then packaging the library into lentivirus transducing this lentivirus into cells and then harvesting the and analyzing the screen using various different tools and then followed by another four to five weeks of validation so in total the screen takes about three to six months to complete but at the end you get an unbiased systematic assessment of which genes are involved in the mechanism for your target phenotype making a very powerful tool so starting out with library construction we've provided several different ready-made the libraries on a gene these include the Gecko library which has the one vector system with the cath line and a guide RNA on the same vector or the two vector system were the cafe and the guide RNA on two different vectors the novel libraries have six different guide RNAs targeting the 5-prime consists of exons in the activation libraries we have the also the two vector or the three vector system and they have different components of the Sam activator in in each of the vectors and these tend to have three different guide RNAs targeting the 200 base pair region upstream of the annotated traditional start site both of these libraries come in either the human genome scale version or the mouse genome scale version and I'll point out that the even though we have the gecko v2 libraries on a gene I would actually recommend going for their brood Ella library because that has a much more updated efficiency as specificity score for screening so we at the same time because a lot of screens especially like non-coding screens or targeted screens require different libraries than just like the blanket genome scale library we also provide different methods for generating custom libraries so starting out with a custom targeted library but target library is very useful because it's been sent in some screens you want to reduce the number of cells that are required for screening I will go into this and a little bit more detail the number of cells that are required for typical screen but that number is generally around hundreds of millions of cells and so for sensitive primary cells for example you really have a motivation to reduce the number of cells that are required for your screen the targeted screens are also very useful for when you are only concerned with a subset of genes for example only kinase Asst for a cancer screen so the way that this works is that you we provide a script to isolate a subset of SJ RNAs from the larger pool of genome scale like genome skill library based on a set of targeted target genes that you want in your library then we also have the de novo library design this way this the Python script for this works is that you have a genomic region of interest for example a non-coding region or different for instance different long non-coding RNA promoters and then you can pick guys based on minimizing off target activity maximizing on target activity and other criteria that helps with sgrna synthesis and for this includes multiple guides per target and also along fine guides the non-throwing guides help with determining the noise level of your screen and basically determining whether or not you're the enrichment you see in your screen as a result of noise rather than your applying a screening selection and for both the target and targeted and de novo designed libraries we provide a very easy to follow protocol that involves basically peace our amplifying the Allah goes Gibson assembly into the positive vector of interest and then purifying is amplifying this through electroporation throughout the custom colony process and throughout the entire screen it's very important to maintain representation by scaling up all the reactions accordingly and we actually have a table in the protocol stating basically for a certain size of your library how many reactions of PCR you need to do how many reactions of Gibson you need to do and how many bacterial clones you need to generate to maintain representation and then finally before starting the screen because the screen is a lot of work we always recommend that you verify the library representation by ng s even if it's a library that's already available from a gene that's because sometimes you know there's always different biases in the amplification you just want to make sure that your library is not super skewed all the guys are there before doing all the downstream work and our criteria for whether or not you should proceed for the library is basically if you have over 70% of the guide rnas are perfectly matching to the library and less than 0.5 percent undetected guides and the skew ratio of less than 10 the skew ratio is measurement of the number of NGS counts for the top 10% of the guys to the bottom 10% of the guys ok the next step for aesthetic screening is packaging the library into lentivirus for us Valente virus is usually the most ideal packaging methods for screening because it integrates in the genome which means that for positive and negative selection screens your guide RNA will amplify with the genome and amplify it or decrease with a number of cells in your population and so you can just measure the NGS counts of your guide as a proxy for the number of cells in your population so in the in the paper we include a protocol that has both lipo fact Amin Pei lipo vasectomy was what we used for all of our published screens and Pei we've also found to give comparable results and is much much cheaper than lipo fact I mean so we also provide that protocol in certain cases for a non-dividing cells such as neurons you can also use a V but this is much less common for screening and then after a lengthy virus production we will put the library into the target cell line and apply the screen slashing so Capitolina lentiviral tighter in a screen is super important because you want to keep the lentivirus mo eye or a multiplicity of infection low enough such that most cells only have one guide RNA that it's integrated into the genome this means that when you read out your screen you don't get a lot of cells that have multiple guide RNAs and therefore multiple different perturbations that can come found the screening analysis what the multiplicity of infection is is it measures the number of cells surviving the selection divided by the number of cells without the screening the antibiotic selection and for for measuring this we provide a spin faction method for pretty Hardy cells or the mixing method for things like stem cells and neurons and this really varies depending on your cell type what I would recommend is try to spin factor cells if they don't die then through the mixing method for the estuary library we recommend an mi of lesson 0.3 and then after calculating the lentiviral tiger we scale up accordingly so for scaling up the transduction for the screen for for example four and a library of a hundred thousand different SG RNAs if we want to screen in coverage of five hundred cells per se RNA which is what the minimum that we would recommend and we are transducing on an mo at zero point three that means that for each by up you need to start out with a hundred sixty seven million cells so right now based on this calculation probably the targeted screens are looking more more attractive so this is a lot of cells and so you need to plan accordingly because usually for most screens we recommend at least two to four by reps so that it really adds up to hundreds and hundreds of millions of cells at least to start and then during the screen to maintain that representation after the cells have been already selected for we just met maintain fifty million cells okay so for a knockout screening typically for most for most screens people will transduce the cast line and the mo AI that is fairly high but less less than 0.7 and then introduce the guide library at Mi of less than 0.3 we recommend for this to start the screen seven days after library transduction because that's when the in del saturate so you don't want to start applying their screening selection earlier than that because you might not get the maximal in delegate and throughout this process again maintaining representation at greater than 500 sales per guide and tip for most screens I would recommend starting with two by reps and then the data looks good then do another two bar reps it really cleans up help cleans up the data for an activation screen the first couple steps are the same you put in the activator components at in my mo I have less than 0.7 then put in the guide RNA library and for activation because this acts much quicker you can start the screen as early as four days after library instruction so a general considerations when you're applying your screen instruction obviously this really varies from screen to screen but basically you want to choose Sweeney parameters that maximize the difference between the experimental and control conditions typically the way people do this is that you know a couple of a column of genes are X or positive controls a couple of genes are native controls and these two sets of genes will help you set like for example what drug concentration you're using or like yeah other parameters that affect your screen during the selection process for drug selection we usually use the ic50 value of the drug because the that basically that value avoids over selling and under selecting your cells during the screening process again it's very important to maintaining the coverage of over 500 cells per guide in your library throughout the process if it's the first time setting up a screen you don't know how long to run the screening selection for I would recommend just collecting multiple time points because if you're working with dividing cells chances are you'll have a lot of extra cells if you're just maintaining 50 million cells throughout the screen so what you can do is that week one freeze down the cell pellets at minus eighty and just keep them there and harvest them if you think those will be useful and because like freezing down the cell Paulo it really is not too much work but redoing an entire screen is a lot of work and throughout the screen it's also very important to remain very consistent so for screening we won't change batches of FPS for example because even things like that for sensitive screens that makes a huge difference to your screening result okay then at the end of the screen we will harvest the cells for screening analysis in the protocol we use rigor which was originally developed for analyzing shrnas brains to identify candidate genes the way rigor and many other screening analysis tools work is that you basically normalize the experimental guide counts to the control based on NGS counts then you rank all your guides based on the ratio of experimental guide count to control and then for each gene you look at the ranking of all the guys that target that gene and then figure out the statistical significance of whether that whether that gene is significantly enriched or depleted right now I think the best tools that are out there is probably magic developed by another group magic is a lot were dated and and I think besides magic there's also starved which is very useful for screening analysis and then after you after you do all that G get all the gene rankings from The Sweeney analysis you can take the either the average of the rankings or take the overlap of the top genes to figure out which ones you want to follow up on so the last step of the screen is a validation process the validation arguably is probably the most important part of screening and it's going to be the part of the process that takes the longest time the screen is very noisy and at the end of the day you get a list a ranked list of genes but you really don't know if each gene truly confers your phenotype obviously if you're doing if you do set the screen very well and there are a lot of genes are involved in your screen you feel type most of the top 10 will be genes are involved in your phenotype but for a lot of screens the phenotype is very mild and not all of this the genes are enriched all will be the ones that confer your phenotype so you want to go back and introduce the target guide the guide RNAs for each gene individually to make sure that the phenotype is what you expect from the screen so for the first part of this what you will then do is pick the genes that you want to validate pick all the guys that that target that particular gene on the screen and clone each of them individually into the vector that you use for your screen patterning to learn to virus and then transduce and select same way you did your screen except now all the guys are individually packaged so you know that the if the if there is a phenotype your guide it was the one that comfort the phenotype so for a novel screen after all this we usually will extract genomic DNA from the validation cell lines and then the protocol we use two rounds of PCR for the target site with owner that adaptors for Angie's and the way the reason why we do two rounds of PCR for this process is because usually you're validating say at least 30 to 40 different target sites and so you want to have PCR primers that are very versatile and can append different barcodes to different hard target sites and so we recommend the two rounds PCR for this process and for the screening fields have you need to again check that these guys do confer the screening phenotype assess the Endo's and then a lot of times what people do was verify by Western blot that the protein of interest was actually knocked out for activation what we provide is a homebrew protocol for rapid extraction of RNA and reverse transcription and this this protocol does not doesn't involve a column you basically take a plate of 96-well plate of cells typically 96 wells and then you add the lysis buffer to it mix up and down a couple of times and the transfer transfer this mix into the RT reaction and this is very similar this is basically a reverse engineered version of the commercial salsa CT protocol which costs about I think a couple thousand dollars per per 96 well plate and in contrast our protocol costs about maybe one or two percent of that so it's very cheap very easy to use and I would recommend if you're not doing even if you're not doing screening at least take a look at the RNA extraction and our 2 protocol that we provide that is very cheap and very easy to do and then after this we recommend doing the tag line qpcr the type of QB sir is preferable to the cyber qPCR because there's a third primary probe that basically releases a floor force when the when the polymerase goes through it and so it provides an added layer of specificity to the primers then again verify the screening phenotype and make sure that the guide RNA actually activated the gene of interest by assessing the mRNA and by assessing the protein by Western blot so finally the most important screening considerations is again I've repeated this throughout the screen but I can't emphasize enough how important it is to maintain the screening coverage of over 500 cells or every molecules or whatever throughout the screen and it's very important to include controls such as non tardy in SG RNAs or SG RNAs that target genes known to affect your phenotype and then it's also very important to include control conditions that did not have your screening selection so you know that what the distribution of guys look like without applying the screening selection and then throughout the screen remain consistent use the same protocols for screening and validation so that your everything checks out and finally I'd like to thank all the people who helped me put together this protocol everyone from the song line and thank you very much and I'll take questions now [Applause] hey Julia I have a question about validating the guides so you said you used the guys you're using in the screen why is it possible to use independent guys targeting the same gene so some people would do that too in addition to make sure that the guides the basically the phenotype is not due to off some off target from some guide in this in the library so that's also possible I usually will start at least with the guys from the library so that you know that there should be a phenotype right and then if you're worried about off targets usually that's not a huge concern if you're using multiple guides but if you're really concerned about off targets you can always use independent guides I had a question in respect to the activation screens or the inactivation screen so the knocked out screens in terms of their you design your guide RNA closer to the transcription start site or for example enzymatic domains or a mix of both would you be able to comment and which ones are more effective so for knockouts there's actually a paper showing that if you target the functional domains of the protein for instance kindnesses I think they did it for kinases you get much better screening results in in our genome skill library designs we didn't do that because for most proteins you don't actually know what the functional domains are so what we do is we knock out at the five prime constitutive exon and hope that there's it generates a missense a frameshift in the protein and then knocks out the entire area for activation what we found is that if you target the 200 base pairs upstream of the trenchers NLT the transitional start site for for all the genes that we've tested this leads to a very robust translational activation of the target gene and so that's how we designed our library find out here one question regarding they like targeted knock at not knock out the whole gene but particular domain so can is there any study that they can modify protein protein interactions because we know a lot about protein interacting domains and are there any studies that particularly targeted that and see if we can efficiently modify protein interactions instead of knocking at whole gene but more targeted in that SPECT so in a screen not that I've heard of but I'm sure in basically for single genes people have done that do specifically I'll say a particular basically a particular interaction versus frame shifting the entire protein okay so thank you Julia and Julia will be available during the breakout session for questions and Agilent will also be hosting a breakout session and they'll be talking about ultra high quality custom guide RNA libraries for CRISPR based functional genomics and you may be able to get some other tips from them about designing libraries for our screens so I think we'll so we'll move on to our last talk for this session which is kind of an introduction to RNA targeting systems so max Kellner who is a visiting master student in our lab from the university of vienna has been working a lot with some of these enzymes and he will give you guys an overview of these enzymes and some of the tools that have been built using them I'm going to switch gears now instead of talking about DNA targeting enzymes I'm talking about RNA targeting enzymes and specifically CRISPR caste 13 which kind of has been demonstrated to target RNA for the last couple years two years actually so from our today's talk I would like to talk about the what wine how love our new targeting using case routine what in terms of introduction to crisper cast 13 biology in some in vivo and in vitro applications to tell you why we think RNA targeting enzymes could accelerate RNA biology research and finally some design principles if you want to implement this to your experiments so this is kind of a summary to talk about the class to affect us crisper factors we've heard about Kaz 9 and Kass 12a cache 12a also see payphones so these were renamed to cast nomenclature so cast 12a as I said it sepia form so in stead of targeting DNA like caz 9 then cast well they do cast 14 target single strand RNA and it does not use the conventional rough Co H&H domain like the DNA targeting enzymes but it uses heptane domains and what it is I will explain in a little bit but there are also similarities to other cows to effector enzymes including the guide RNA structure so you can see that it's actually very similar to see perform responding having a direct repeat sequence a spacer sequence that's complementary to RNA instead of using a pam so protospace proto space adjacent motive like has not only cast 12 it actually at least in vivo does not use such a motive in vitro and in bacterial cells it was found that something like put a proto space of flanking site exists so a single nucleotide that can affect targeting efficiency yeah so these happen domains are essentially conserved in among cast routine enzymes and they also found in other RNA targeting enzymes even the ones that found in human cells like our nacelle and these happen domains are marked by two conserved residues arginine and histidine and traditionally happen domain containing proteins have metal ion independent catalytic RNA degradation step but the fact that they dimerize to form a cleavage interface casts protein kind of gets around the dimerization by having two of these domains implemented in one single factor complex but cache patina is not the only CRISPR enzymes that contains these have been domains there are other ones found in crispa type-3 system namely CSX 1 and c sm 6 so this suggests that in the beginning that because Cass machine has had been domains it probably is an RNAi targeting enzyme this is kind of to show you the CRISPR type 6 diversity so in the last year they have been numerous papers showing that there are different subtypes of the type 6 family starting with Chris CRISPR cast 13a which was first described biochemically in 2016 and then followed by a casper 18 BC and most recently d and the reason why I show the slide is to show you the diversity of the low side so as CRISPR systems they all have a crisp array but you can also see other features like accessory proteins within that array namely Casman and has - which are known for space accusation integration which are not found in other type 6 subtypes but then also interesting accessory proteins found in these low side namely for example CSX 28 and CSX 27 found in CRISPR type 6 B 1 and B 2 systems but also pass 14 D successful domains and these were found in vivo to at least modulate activity but one a very interesting observation recently found that casts 14 D family are actually much smaller than other RNA targeting enzymes like has 14 a and B as I said with this happen domains Kaspar teen is an early targeting enzyme and this is shown here that Casper teen operates in the RNA guided manner so without guide you don't get cleavage and it's also interestingly two other have been domain containing proteins MEDLINE independence so addition of EDTA kind of kills in vitro RNA degradation and to the difference to Arnie I or shrnas or micro RNAs which usually have a single cut side so if you do such a gel usually get like two fragments because it cuts once kaz 14 seems to cleave the transcript across this line something that is either a native biochemical property or a collateral activity which is something that Kashfi teen has and I will come to in a minute this case routine system as I said is also has a weird interesting property called collateral RNAs activity which essentially means that when Castro teen loaded with the guy recognizes the target that's complementary to its crystally it unleashes rnase activity to bystander RNA and RNA that is not specific or complementary to the CRISPR knee in this case it's labelled as collateral RNA so this gel based research shows that if you have a labelled by Stan are any collateral RNA upon addition of crisp Ani and target you kind of see cleavage cleavage of this collateral Ernie something that has been turned into nucleic acid detection tool or now we'll finish my talk actually with this application to summarize Casper teen biology has 14 is a crisper type 6 system it has a dual RNAs activity where one activity processes the CRISPR knee the priek respond is so this allows you to for example for new applications having a crisp array and have Casper teen to process this in the individual spaces um this catalytic activity is independent from its rnase activity in darkness activity which can be specific so this acting on its RNA molecule that it's bound to or in trance after activation of a complementary RNA so why we want to use Casper team there are numerous applications that you can think of for in vivo studies this is from a review in 2011 where people looked into other RNA binding proteins how they can be used to kind of engineer the transcriptome for example you can imagine that you want to visualize RNA RNA visualization has typically be done by inserting trans gene sequences like ms2 sequences which are then recognized by coat proteins but it is kind of an engineering approach that is not amenable to endogenous genes so having RNA binding proteins allows you to kind of visualize these endogenous genes by fusing a first and protein onto it you can also imagine that you would for example like to understand how localization of an RNA affect its function so by fusing effector domains to your RNA binding protein you can kind of alter localization as and as cows routine and other enzymes are RNA targeting enzymes you can essentially degrade RNA in a targeted manner but you can also pay around on more sophisticated levels for example modulating translation buying hansung it fusing initiation factors onto it or repressing it and you can also model spicing by just simply designing binding proteins or in our case case routine that targets spicing site so you can modulate spicing by including or excluding certain sequences so the first application would like to talk about is our knee knockdown in eukaryotic and prokaryotic cells this is kind of an example from the first demonstration of this in mammalian cells where we have cast 14 expression pass meet where the crisp on a is likewise non driven by u6 promoter you have neutralization sequences attached to two casts 15 in addition to that you have a corresponding expression plasmid and for this particular essay we also have a reporter plasmid that has two luciferase variants onto it one of which is targeted by the CRISPR Nate that's found in the CRISPR a plasmid which allows you after transfection to essentially read out Casselton activity by having relative luminescence by luminescence activity this is kind of to show that among the Kasper team variants tested all of them have kind of robust no connectivity so on shown on the left with case 14 in our hands being more potent in our knee knockdown what we have here are two guides targeting luciferase and we plotted the guide normalized values for this lucifer's and you can see that for example Casper can be 6 and be 11 a very potent knockdown surprisingly is that Casper teen in vivo does not display a collateral on its activity so if you do RNA seek and look at differentially expressed genes and compare this to conventional donked on strategies like RNA I you actually don't find off targets which is surprising given that in vitro we have the skeletal RNAs activity next I would like to talk about applications for RNA imaging using dead cast 14 so you kind of take haz14 and mutate have been domains which are there are any targeting domains and then clear varnish so if you mutate those you kind of use RNA kind of for all binding applications including emitting and the first application done by Jonathan Omar in our lab where they used Casper tea-infused Jeremy GFP for its folding property but then also a zinc thing and a crab domain which essentially allows us to be a negative feedback because for example if you cast havoc as protein fuse to a fluorescent protein and consider constitutively expressed you can this kind of results in a strong fluorescence independent you can imagine for its ability to target RNA so what they used is they fused the rip this repressor domain together with a zinc thing onto it so that when when the RNA that is targeted is not present it should replace its own transcription and not being translocated into the silo Padma this is kind of one of their paper figures where they show acting bitter act in targeting guides to targeting guides and a non targeting guide and you can hopefully appreciate the subcellular localization that is distinct among those the middle lane is is a staining for stress confirmation so these cells were treated with sodium arsenide to allow for visual issues with visualization of of stress scales where RNA molecules are trapped and these RNA molecules are targeted by casts routine loaded with the beta acting guide and you can see that in the oval image of staining then you mentioned this but irony CRISPR has 14 can also be used for RNA editing so you maybe you've heard about has 9 DNA based editing work has 9 dead cos 9 was fused with the domain that the emanates a basis similarly we have engineered has 14 fused to aid our or the a - Dana of Adar - which allows us now to do targeted based editing on an RNA level and the picture below kind of shows the architecture of such an interactions where you have the guide and within that guide you place a mismatch that's opposite of the a you would like to deaminate because this is an addressing twinness in deamination event and this mismatch allows a hype active variant of Adar to specifically enter this site so unlike DNA base editors which usually have like a window we can specifically alter that one site which allows for base conversions in the RNA level something would could be reversible instead of permanent DNA changes but am I going to more into details about this because Omar and our lab will have a talk about this tomorrow and to finish on the in vivo applications it was also shown that casts 14 that has protein can be used for modulating spicing by targeting certain sequences found in exons like donor spicing acceptor sites and Silvana we'll talk about this tomorrow so again I'm also not going to detail into that so these are very exciting applications for in vivo targeting so some design principles it's actually very cheap sorry very easy because all our case 14 or demonstrated has 14 expression plasmid have been deposited on cast on ad gene so what do you what you can do is simply type in cast routine on a gene and select your expression plasmid of interest the only thing you essentially need to do is design crispiness basis as DNA only goes hybridize them together and GoldenGate clone them into our corresponding expression plasmid but you can also get from add genes then you transform this into a coli harvester plasmid and is ready for transaction so for transaction essentially the deliveries agents and harvest RNA about 48 hour post transaction and then you read out on a targeting efficiency by the qpcr reporter forest and so by luminescence as i've been shown with loc phrase vector or RNA seek which allows us to actually look at the sensitive in the specificity of our net hiding and now I'd like to home one more time into this collateral honest activity which seems to be a case in vitro but in vivo we have not found evidence for that but in vitro at least it's a very useful application so we kind of use a casualty not only for transcriptome engineering in living cells but also for detecting and you can ask yourself okay why why do we want to do that is because there are several applications which require actually on-site diagnostic platforms we don't have sophisticated equipment including like a thermal cycle or a first and paid rate of any sort so this can be used for for viral detection discrimination between bacteria but also because of its sensitivity detecting rare events like DNA mutations found in cell-free DNA so our lab has turned this into this case routine detection into a diagnostic tool by combining it with pre amplification where you're starting materials either DNA or RNA which is then isothermally amplified with RP recombinase polymerase amplification reactions where the main difference is that either it has a reverse transcriptase error or not so you can actually have RNA or DNA which converts that into double strand DNA having a t7 promoter introduced by the primers which is then turned again into an RNA molecule by in vitro transcription and then you leverage Casper teens collateral activity to kind of recognize that specific Ani molecule but because it undergoes this conformational change it allows you to collateral cleave a reporter which unlike TaqMan probes is not specific to a target sequence but can be uniform reporter molecules used for all your applications so the variable 2 in this application is essentially your guide underneath so Casper teen by itself is not very sensitive but if you couple that with pre amplification you can get down to a single molecule per microliter levels or more recently 2 molecules per milliliter something that is actually the case for physique infections and as I said it's not limited to our knee you can also use DNA as a starting amount shown here by essentially having the same sequence that is targeted one form is only one from this DNA so in both cases you get high very high sensitivity to kind of get around the requirement for a fluorescent signal our lab has turned this into a low-cost portable and Cola metric platform but kind of changing the design of a reporter and using lateral flow technologies similar to a pregnancy based test where unclip reporter is sequestered on the strip under the first line control line because it has an intact biotin tag which is request as the whole report on to this very line first line but upon cleavage of this reporter you kind of separate these two parts together where one family knows by an aunt if I'm in the body and travels on to that second line the antibody capture line which allows you to read that out similar to pregnancy based tests and this shows also very high sensitivity down to 10 200 molecules per microliter and we're currently optimizing it to be more sensitive some design principle for Sherlock it's similar to in vivo applications so you design your your crisp RNA having a 5 4 3 prime repeat it's an important consideration because these different casts protein enzymes have to repeat around a 5 or 3 point we currently use a 20 nucleotide a crisp on a spacer that's complementary RNA target and if you design this as an DNA ultra mirror I want to produce your own guidance the ordering and already made RNA targeting you have to include a t7 promoter or this has reverse complement so your RNA is made in the right orientation you also need to design IPA primers which are similar to PCR primers the difference that they are a bit longer in size and because you want to turn that DNA molecule into an RNA molecule you also have to include a t7 promoter into its forward primer or reverse primarily depending on which is rainy want to target which then allows you to kind of turn that into an RNA and detect it and finally after order all agree agents the RPA kit which is from TST X which is available as our TRP or IPA kit then you have to make essentially case routine because unfortunately there's no commercial avaible haz14 protein as far as I know but all these expression plasmid that we have we and others have used in papers are found on Ed genes and these are sumo text expression vectors so that the purification protocol is fairly easy and these are highlighted in our papers so currently you have to make casts within yourself but this is actually another very complex task if you are familiar with protein production you also need to buy t7 RNA polymerase and in such an essential component of this in vitro transcription reaction finally and your fluorescence or color metric reporter for the detection part so basically if you want to implement Cavatina on your research or you're kind of lost in the CRISPR diversity and don't really know it should I use casts between a B C or D or any thing about your RNA tagging experiment just come see us after my talk essentially after the coffee break because we would have breakout sessions but it's not only true for RNA targeting but also for other CRISPR applications so just come see us after the coffee break can be of breakout sessions this I'd like to thank the young lad Jonathan Omar who have trained me over the last year to be a kind of least lab expert in Catholic in biology around and other GE co-organizers and you all for listening and perceive it's participating this [Applause] freshness apparently I'm sitting in that worst place quick question about the imaging ones for visualizing RNA mRNA I was wondering is it a quantitative or qualitative can we actually measure the amount of RNA and with what kind of accuracy can be used at in order to that say particularly we are interested in one particular RNA can we trace it a live to see like if its expression is changes in response to particular event and be like how much like branch of dynamic like wrench can we measure yeah so it's a good question so in the paper published in nature last year and we have essentially demonstrated in situ are any detection imaging but not on a single molecule level so to answer your question whether it's quantitative or not we have not explored single molecule application single molecule imaging technologies but yes you can image this in life and we have done that in the paper and you can essentially highlight this so the ribs of observed distress kind informations in living cells something that is useful because now you can target entity actually endogenous genes or in vivo tissue yeah yeah yeah yes so that's an interesting question because there are many diagnostic platforms which require which require you to have for example very pure starting material in a paper published about two weeks ago which we call out that we actually show that you can use urine for direct detection so you essentially heat treat a urine to kind of deactivate an endogenous rnases and you can then use the sample directly we've also played around with direct crude extracts from genetically modified plants so you can do direct GMO detection but just simply grinding them in a very easy lysis buffer so it seems to be very robust technology and in terms of contaminant sensitivity hello we've seen that the castor diene system can be used for RNA editing and one of the applications being the the localization of an mRNA can be actually changed so let's assume that the mRNA has a different signaling sequence let's say mitochondrial localization signals can that be replaced with the nuclear localization signal using castro team so you're essentially saying that you want to use RNA editing to convert this multi-country signal in any other sequence or you will just like to target it by haz14 which has an effect that then d localizes or real oka lies it to a different place is that the question i needed to be specific to the nucleus yeah so what you can do is essentially target this RNA by having a nuclear localization signal on your RNA targeting enzyme which is castro 13 which makes it a very nice tool for also targeting or imaging nuclear localized genes something that usually is very hard to achieve with any high technology so yeah you can redirect the localization of RNA molecules by having the nuclear localization tag present on on your case 14 enzyme does that mean that the mitochondrial localization is being redundant or it still exists in the RNA along with the ml n LS yeah so I cannot give you a detail answer but maybe we can talk about this after this talk so I don't think we have explored this in detail but it should be possible thank you all right all right Thank You max and thank uh thank you to all of our speakers in this first session yeah so before we break for coffee I just wanted to bring your attention again to this so we'll do like a 15 or 20 minute coffee break and then after that we'll have breakout sessions so we have four different sessions these will run concurrently you're free to move from one session to another in the auditorium here kind of at the front we'll have experts on cast nine and at the back we'll have experts on cast thirteen for any kind of RNA targeting applications and then in the lobby itself Lin Wu is here she's the director of the genome modification facility at Harvard and she will be speaking about making mouse models using CRISPR and she'll be out in the lobby and if you go through the lobby to the to the very end there's the Olympus boardroom and Agilent will be in there and they will be it'll be sort of a general screening session so Julia will be in there who spoke earlier and Agilent and some other people will be in there to kind of talk about screening applications and then through the lobby and upstairs will be benchley and they have a number of CRISPR guide design tools and just general electronic lab software that is really neat and a lot of people in our lab use it so I think they will also hopefully have some helpful things to say to you guys so enjoy some coffee and there's some snacks out there and then we'll see you some of you back here in the auditorium for breakout sessions and we'll meet back in the auditorium at 4 o'clock for the keynote speak all right good afternoon welcome to genome engineering 6.0 we have will have guests from all four corners of the US and from many other countries around the world so thank you so much for for coming here and I hope the next day and half will be a series of very exciting and stimulating conversations and and learning experiences so if you didn't have your questions the answer so far a lot of us who will be worrying these great t-shirts with the cassoulet schematic on the front will be around so feel free to find any of us during the next day and have and we're happy to any answer any questions and of course after the symposium feel free to reach out who has either directly or via the Google forum and and we'll try to try our best to answer as much as possible and then after this session we're gonna have a poster session and there will be beer and drinks and and some food so hopefully that will help lubricate further discussion around schita me engineering anyway but here I think we're all here for the main program for this afternoon which is the keynote by Eugene Koonin for those of you who may not be familiar with dr. Coonan Eugene is one of the premier scientists at the National Institute of Health in the National Center for Biotechnology information he has been working on CRISPR before there was even CRISPR and had put forth many of the fundamental understandings and functional descriptions of what CRISPR may be doing and since then Eugene has been pioneering the investigation of CRISPR diversity a lot of our understanding of how CRISPR systems are organized really comes from the work that Eugene his colleagues at NIH have been doing and and this is only a small fraction of Eugene has done he's published over 800 papers and I think of those 800 papers CRISPR is only a small fraction and there are many many other really extremely fascinating aspects of evolutionary biology microbiology even evolution of you a biology and that Eugene has really sort of it's got his DNA all over over at those different topics anyway without further ado let's welcome Eugene to to teach us more about Christopher and and broader evolution of biological systems [Applause] good afternoon it's a great pleasure it's a great honor to speak once again at this wonderful workshop as I don't find before it says really and I'm speaking here for the first time in a row and it has really become one of the hallmarks of the year for me so first of all I want to thank the organizers very much and most particularly found for putting this all together and making this possible this is what I what I'm pleased to say every year this year however there is something very special I want to congratulate found or on his election to the National Academy of Sciences just a few days ago I think he he richly deserves and in the region applause from all of us right now so so I'm going to talk to you about expanding the CRISPR high is on the case per universe so to speak there was deeper at some point eventually comprehensive understanding or broad place for us what are their entirety of Keio of genes that are functionally linked to the CRISPR systems and more importantly of biological processes in microbes in which to which CRISPR make contribution as we show hopefully see towards the end of this talk far beyond straightforward defense functions I have to apologize for the error you know on this slide here I have a previous meeting or in the for anyone who might have been present at that meeting and they know several people there there oh the overlap is minimal okay so this one shows you of the Pacific the layout of module only out of the crisper gas system or somewhat arbitrarily but I think not entirely noted in the functions of the products of the COS genes classification divides the COS proteins or between distinct modules or that are involved in adaptation acquisition of spacers which makes crisper the adoptive immunity system it is interference which is the actual recognition and disruption or inactivation of the target genomes which makes the great case for the efficient defense system it is and expression and pretty crispr RNA processing which is a necessary intermediate step about beyond and apart from this there is this ancillary more modern or the genes that are often associated with CRISPR caste system but whose functions as we understand do not immediately fall into one of these other modules and it is into this ancillary accessory module or variable to venture today with a variety of analysis that we have done and that I will try to describe to you quickly enough but I want to start in a somewhat more traditional gradually build up to go into these other dimensions of Christmas so this light that comes for my review or the trapar review paper that we published the funk in Kiera Makarova about a year ago shows you the latest qualification or the crisper caste system I will note it's not my goal today to describe this to you in detail but just a couple of points so um at a deepest lab the CRISPR concessions are divided into two classes class one across - you know there are the difference lies in the organization of the effect of modules or that is those modules that are concerned functions downstream of adaptation of in the processing of crispr RNA and interfere so in correspond these are very elaborate complexes of multiple cast proteins main factor modules varies in class - it is at least face value on the surface much simpler the whole vector functions reside in a in a single large protein such as for instance cos 9 which makes the systems of course to open so attractive and in some cases so efficient genome editing and genome engineering tools so most of my talks it is the previous editions of this dedicated to the discovery and characterization of a novel - Chris Parker systems in particular the discovery of several new subtypes of type 5 that shed light on the evolution of the effector complex and of type 6 or that is the first and so far only case of criticism dedicated to recognition cleavage of RNA and that in the able hands of fans lab and some others have now or become very very useful new tools of molecular biology so I want to expand on this even more even today so just very recently we managed to further expand subtype 6d through the efforts of the alumni of funds lab now at the arbor technologies company Dave's called Winston yen and they change oh I think they I believe David Vincent here now so good so you know this was a this discovery and characterization of the normal new subtype of type 6 systems which we denote cause of near 30 and D which was simultaneously and independently all studied by other alumni of fansler Phillips in silvana cornermen in San Diego in which which is very interesting in that it includes or the smallest shortest parties among open type six or varieties of which obviously own about 20% smaller than others which makes them obvious good candidates attractive candidates for you of molecular biological tools but there is something else that I want to emphasize as a transition to my main subject today that other aspect is that open these cost 50 Andy effector proteins are associated with a special kind of accessory proteins which able opening describe in some detail so phylogenetically oh cos 30 Andy is a sister group of now well characterized of car 13 a and it can see much in the alignment on this slide but what you can see I think is the diversity of these parties which is intrinsic feature of many cast parties but car 13 in particular beyond the to help and mates or you really do not of the chatter active ribonuclease you really cannot see much conservation even within this group let alone between so the Casselton the subtype 6d in itself is fairly diversified found in a variety of diverse bacteria as well as in meta genomes and the notable feature of the locus organization here is that almost always or that cost 13 DG is accompanied by another sheet which encodes which contains and called the protein containing the so called WI by L the Manor will domain as we pronounce for simplicity which is not particularly well characterized but it's at least predicted to be a nucleotide binding signaling domain and in these limits of reported in this paper Winston and others have shown that addition of the wo of the will domain or parties significantly or by about an order of magnitude increases the efficiency of interference of Munna by the car 13d or protein so in addition to the attractiveness of potential tools or the study of these systems reveals a novel type of regulation of CRISPR activity incidentally when you when we look back at the literature what we notice is that you know these will domaine containing proteins can also regulate certain type 1 systems in this paper by Volgin case and Quebecers it has been sure the stimulatory effect has been shown on the activity of type 1 systems and it has been proposed or that this protein is a transcriptional activator the results in this paper such as this is probably not the case this is done through direct interaction and ligands of which remain to be identified so that was a brief introduction and connection to my previous presentations here are very discovered new types of effectors of CRISPR or caste systems now I want to focus more on the ancillary or accessory proteins and they want to begin with this by now I think rather well known to everybody who is interested in CRISPR but I think spectacular case case in point so type free your crisper caste systems all of them and contain the castell party which is the large subunit of the effector complex as I mentioned a few moments ago in class one you know Chris Burke assistance including type 3 the effect of complex our elaborate machinery of made of multiple distinct caste forties and the largest of their the large subunit is the custom party but apart for the structural my colleagues and I predicted quite many years ago as funk said in his introduction before Christmas that these proteins contain all the signal signatures of catalytically active of nuclear nucleotide polymerizes of high places and therefore must possess this activity for many years of you know this this prediction of has not been tested became a source of a bit of an embarrassment if I may say so color things changed in about a year ago to laboratory's of Martinique in Zurich and wages sickness in Vilnius of simultaneously of discovered the activity of the polymerase cycle is the main of Costin protein and the role of this activity of in the functions of type 3 Chris Ferguson so what happens is this upon target recognition by the effector complex confirmation of Caston changes and the polymerase cycle is the domain gets activated and start illegal adenylate start oolagah adenylate synthesis these illegal adenylate become some ligands for the so called cars domain or that is in the present in another accessory in an accessory protein which is found not in all but in the majority of type resistance and known variously as season six or CSX mom and consists of this nucleotide binding curve domain which means CRISPR associated Rossmann fold which was another of our computational predictions that until this work have been completed remained untested and another effector domain which most often contain represents cap and Amin ribonuclease homologous to death in type six CRISPR caste systems and sometimes other nucleus and so it has been shown in these studies that the CAF de me binds with Allah God enolates produced by casting of this binding activity changes the conformation of this protein and activates they have and Amin that then start nonspecific DNA degradation not of the target RNA but of all the rnase around potentially as we strongly suspect connecting CRISPR corresponds to program cell death in the infected organisms in any case I think it's a very remarkable story that shows to us that these accessory proteins a crisper consistence can be involved in whole pathways of signal transduction whose all probably goes far beyond Chris Berg so continuing all these oils of discovering the functions of CRISPR ancillary or accessory proteins here's a party of nichols functions rather poorly understood denoted CSM - which is present primarily in type 2 a crisper consistence of here we momentarily move back to this 14 is known to inhibit the non-homologous end joining repair by type to crisper caste system what is their biological all of this incubation is still rather unclear what we found a great surprise with Kiera macoraba in still unpublished analysis is that this season - protein is actually inactivated the loop ATP's so in the ancestral forms of this party that can be reconstructed by the analysis of phylogenetic tree apparently all the catalytic motifs required for ATP hydrolysis have been present and the protein performed some kind of ATP dependent energy dependent ATP dependent function here which probably stink still can be elucidated in those sort of relics of ancestral systems where the proteins remains according to our prediction active in the great majority of the CRISPR Cassation it has been the ATPase activity has been degraded lost and the protein assumed early structure function so all these observations prompted us to go around this business systematically to derive methodology or to derive on the boy or methodology to systematically identify ancillary genes in CRISPR Kasasa try to predict their functions or in decreased Perkis response and beyond and distinguish functional associations from spurious ones so in order to do so in order to do so we did what we do always namely develop a computation of pipeline dedicated computational pipeline so this pipeline we denoted the feature we were after we denoted Chris porosity and Chris prissa T is supposed to mean the Association the quantifiable Association of a particular gene with CRISPR gas systems I shall note that this approach is quite generalizable in general by doing so you can explore all kinds of functional connections in microbial genomes of course of speaking very simply this is all based on the now well-known fact that has been established by our chakra no - saw no shock or almost 60 years ago that microbial genes very often are associated functionally linked microbial genes often for opiates string of called expressed genes on the chromosome and they up and it can as it has become clear later the operands are highly variable so you really gain a lot of information by comparing the operon organization in different bacterial genome so specifically then it comes to or explicity or what we do in a using other previously developed computation python is that i don't describe here if I didn't comprehensive identification of CRISPR raw site in prokaryotic genomic database and we do it by sort of three-pronged approach or we detect CRISPR is using the appropriate computational methods we detect adaptation modules class 1 and class 2 and we detect effector modules or because while many CRISPR caste systems have it oh there are quite a few that actually have only one or two of these elements so we applied the previously developed methods to identify each of these elements and then enumerated all the proteins in the respective genomic neighborhoods from all genomes that contain at least one of these elements then we costed all the proteins to work with gene families rather than the protein famous round ended individual proteins and then we determined the strength or the Association on the strength of the Association is a very simple thing it's simply the number of occurrence of a given gene in the vicinity of CRISPR loss a normalized by the total number of occurrence which shows you how specific the Association of a given gene or the CRISPR caste system is but then you have to do extra analysis in the multi-dimensional space we I will not go into this in much detail but you have to add the factor in the abundance of genes or because it's it's very clear that if you have the number 1 here you have only one instance increase personally and number 2 here of two occurrences of a gene altogether it will formally give you high Chris procedures for a strong association but obviously will mean very little so we have to factor that in the abundance and the actual distance formed from these elements because the calls are better so to speak the stronger the Association oh and you have to isolate the domain of the parameter space which contains bodies closed these properties close to the known and cause genes and these are our candidates that be afterwards or explored in great detail or by case-by-case analysis with various computational methods so what comes out of it is this very simplistic Oh pie chart or what it tells you is first of all that there are quite a few highly divergent versions of known gasp geez such as for instance five and gasps seven so-called domain containing proteins which are known to evolve very fast to be very highly diversion and we identified many additional families which are important but unspectacular findings but also we detected quite many new proteins that did not belong to new of course in these contexts they are not necessarily completely new efficiency that appear to be strongly associated with case percuss systems but have not been described as potential cause proteins before and we also have to indicate that this is not rigorous procedure we don't know to be completely honest how to develop rigorous any kind of regular statistics for this type of analysis so case-by-case examination shows that about half of the hike explicity genes for various reasons because of their predicted functions because of lack of illusionary conservation whatever are unlikely to be crisper associate that said let's concentrate on the positive the cop I think is fairly good there about hundred you or potential you on CRISPR associated genes and encoding parties so immediately we classify these novel potential is CRISPR associated genes or by CRISPR types and subtypes and we immediately see something quite interesting namely that that a substantial majority of these new candidate for CRISPR Association are found in Dupree crisper Oh God Oh Oh there seems to be in terms of the richness of biological functions something special about the I pray we shall talk about it more so of course speak about reading questions here how do we decide whether the association is real or alternatively a particular gene is just part of a defense islands because my dreams contain defense islands parts that are enriched with different defense systems not necessarily directly functionally linked sort of champions of microbial genomes where you can insert all sorts of things and so defense systems accumulate so how do we differentiate between these different possibilities and how do we rule out simply historical synteny conservation so for the next few minutes I will just talk about formal and in formal criteria that allow us to make some tentative conclusions about the nature of these associations between various genes and traits polkas so I'll begin with this example which is one of the most prominent most abundant so it's a gene so I indicate here the number of protein class so when we identify or we work with a number or with the great number of microbial genomes these days so we always have to do clustering of genes too to be able to analyze them and understand what they are dealing with and after clustering we have 50/50 families of this particular gene denoted corn E which is a membrane gene of cost function is a metal iron-on transfer transfer of different art so this gene is found in association with a great number of in considerable variety of type 3 CRISPR consistence moreover it's not found a world it's also found in a subset of these in different subsets so these are associated with two other new CRISPR associated genes which encode different kinds of nucleases so we don't really know what is going on but what these observations suggest to us is a membrane functional membrane connection of type 3 CRISPR cause systems where in all likelihood the corrupt a protein tethers the CRISPR caste system to the membrane and also engages these associated nucleus genes it could be a phage DNA transporter it could be sensor phage and through membranes it could be a regulator it could be finally a cell suicide machine this needs to be studied experimentally but in a case like this I think we don't have reasonable doubt about associate about the functional relevance of the association of such a gene the CRISPR caste systems incidentally of serendipitously in a previous work we already in a small group of bacteria will already detected the horizontal mobility of this gene whereby its horizontal it cancer together that the 3b effector or mojo and joints of different adaptation modules and six genes supporting the conclusion or that these genes is really or the functional modules of cast genes proceeding along the list of these you predicted graceful associated genes or not Oh carve the majority spoke a little bit about custom is there found in many of greece focuses primarily type three once again this is crisper associated Roslin folks respond fold and as many of the people in the audience probably know rossmann fold is the most common type of nucleotide binding the main and in the pioneering studies of the from genican tricks lab it has been shown that in t3 it functions or ago a binding domains but in this laser we discovered a new variety of curve the means or which are very divergent from others and are [Music] linked to transmembrane Genesis in the respective products moreover many of them are associated the distinct parties of the loan like family so probably we're observing here evidence of membrane-associated stress signaling linked to crisper caste systems more membrane connection of pre crisper caste systems in this case completely and characterized or membrane proteins that are found however in many type three systems from Actinobacteria and again can be predicted to be involved in them in a splint membrane signaling so there are quite a few more examples like this I don't get the opportunity to go through all of them I just wanted to give you a visual Express impression on this slide of the complexity of the organization of three Grace Park assistants that exceeds anything that we see in other whisperers variants and here we have previously identified crispr associated genes and quite many genes identified in the course of this project that I'm trying to describe now so I think this is a big challenge for microbiologist that are in may be interested in CRISPR biology beyond adaptive immunity of course this is not oh we shall continue along these lines but no go away from type three and talk for a second about type 4 type 4 is a special kind of CRISPR caste system still very poorly characterized there they primarily but although not exclusively reside on plasmids and they don't have any nucleus that could be responsible for interference so they you know they're found in quite many bacteria and lettuce from quite many bacteria and as far as we can predict they have nothing to do they care they're incapable of interference so they have to do something else so what we observe through the explicity analysis is that many of these systems contain the protein or the c stage family or you choose a family of adenine nucleotide alpha hydrolysis we still do not know what these parties do again Oh the best idea is some type of signaling function related specifically to type for more on these right now from type 4 from time T to type for 24 to type var what it observed in a particular subset of type 1 e systems again found in many know primarily actually even exclusively in diverse genome a seeds what we found here is the association of a particular variety of one assistance with genes that include the so-called Stand and TPAs for me oh this was a case of very welcome deserving because my colleagues and I did quite many years ago of did a comprehensive analysis and identify this family of very interesting ATP ASE's and gtp SS of which include incidentally the key signal importance of eukaryotic programmed so death there are many bacterial microbial bacteria and archaea homologues we do not no one really understands their functions I hope that these findings that I'm describing to you now should change trigger a change in this situation so we did not understand signal transduction ATP ASE's with numerous domains because indeed in addition to the ATP is the main they typically contain repetitive domain TP such as TP are hippies wd-40 repeats or others and what we detect see here is a stable Association of these and TPS is the subset of 1e systems so we do phylogenetic analysis in this case of the cuss five or family from bond assistance and we detect a number of distinct branches that includes cos Phi hati's a form of loss a devoid of adaptation model which again suggests functions of grace processes beyond adoptive or immunity so when we zoom in and perform the phylogenetic analysis of this particular branch what we see is a very interesting picture we see a particular branch of that in which the one awoke was a contain these stained ntps associated with TPR domain and another smaller much where they contains 10 TPS associated to wd-40 domains and we also see that in these branches the cause in three party which is responsible for interference initially degrades losing the helicase domain and then disappears of apparently rendering these systems incapable of interference so these are the observations on these ntp ntp aces associated with CRISPR which really strongly suggests a link between Chris Burke assistance and either program cell death so this could be defense without interference from programs and or other signal transduction roles interestingly some of these entities are also fused to caspase domain which is a parties which in all eukaryote is a key effector of program cell death so perhaps such processes remain to be discovered in Tirion very quickly I will talk about a formal approach of differentiate of functional associations from purely historical conservation or spurious or occurrence we do this we try to do this through call evolution analysis so for each there of war say we build on genetic trees and derive distances from these dates and they do this for effector module genes on four species three four sixteen asani and for the gene of interest so if schematically if we are talking about a real functional cessation we will observe a strong correlation between these distances like this if we are talking about something spurious or purely historical synteny will not observe such an association so as a control there is very this very strong correlation between cos five and augustine genes for type 3 CRISPR caste systems and all we move to require a to the newer newly predicted of ancillary gene and you see that the correlation between the phylogenetic history of core a and castel is about as strong as where between Castine and caste 5 the bona fide caste genes whereas or the correlation with 16s RNA is quite weak indicating the history of horizontal transfer in these these genes in other cases we actually chose I see sides of apparent vertical evolution for a newly discovered accessory proteins such as with the membrane curves so to summarize this part or basically we developed a systematic procedure for detection of CRISPR associated genes the flexible parameters we can play with this quite a bit and of course apply this to other functions in microbial genomes we notice that we the core cast genes are already known but that said many developed subfamily subgroups remain to be undetected and need to be discovered through the use of such methods and powerful sequence similarity detection techniques we identify many new accessor and genes and found that type 3 systems are in a sense most interesting and most complex this is the home for the great majority of new genes with the emerging themes of membrane Association and various former signal I don't know what is special about them became suspicious that maybe they may be actually and faster to all crisper cousin through that who own history accumulated all these additional functions resulting in functional versatility but I really want to emphasize is there are so much biology to be studied here and in here's another point that I want to emphasize which is a non-interfering CRISPR caste systems those that contain no enzymes for target cleavage and accordingly by perforce do something else something other than interference this is a type on East and ntps containing all say that I just described program so therefore single transduction is employed these are the so so to speak minimal type an FC stands carried by transposons that we described about a year ago which may be involved in a sport an integration of these are type 4 systems what which I spoke a bit today which may be involved in plasmid maintenance in ways that we don't still understand these are inactivated type 5 u systems and I am sure more of such systems are coming of every explore or microbial genomes so I really want to bring home the point of that it's oversimplification to think that crisper caste systems are just adaptive immunity systems in micro apparently they can do much more and we have to figure out what in the remaining few minutes I will completely three gears and speak a little about year-olds for crispr RNA so far we have been talking only about what encoding genes a few moments about CRISPR RNAs so we know very well it has become famous that type two of CRISPR caste systems of his tracer RNA for breezier RNA processing and interference and tracer early contain an entirely built region and former in traffic hybrid with the crispr RNA so they were interested to more broadly in microbial genomes to see to search for possible repeat derived non-coding rnas that maybe somehow involved in CRISPR cos function so we did a comprehensive search and came up with something that in part is a cautionary tale namely that something that looks like anti repeat sequences on that for the decently stable hybrid repeat can be found in quite many places in the genome so here we have the Delta G distribution and there are many of these or quote unquote entire repeats in different places in the genome that are probably ear relevant here you have a highly stable repeat and repeat base pairing here we have also quite outstay tracer irony but here you have something interesting a little weaker the tracer early but still quite stable and I repeat upstream of the CRISPR array and this is found in a variety of type 2a systems some type 2b systems and in type 6 systems as well and era where we found it has been shown that in type 6a of this and a small entropy journey is highly expressed we still have no idea what they might be doing but the combination of high stability or Rho Delta G of CO folding evolution and evolutionary conservation and in this case an expression shown in this case and a few other cases in rod or baramos lab suggests to me that these things do have some function increase per system in the final theme that I want to cover very quickly is search for CRISPR in bacteria phages so there are several examples of complete CRISPR caste systems found identified in octave refuges and apparently involved in an take defense functions and these provocative discoveries prompted us to do a comprehensive search this analysis is still underway but I will give you a quick and I think striking story this is the story of conserved mini CRISPR arrays in bacteria phages so these are several this figure illustrates several pages of silicosis thermophilus and all these changes in a particular region on the genome in the region separating the genes for small and large terminus subunit contains [Music] this unique feature namely a crisp repeat identical to the type to a repeat in the host and then another either complete repeat or half repeat and unique spacer or between them or there are no questions and no tracer irony in this phase phase genomes or butter or some interesting features here suggesting that these arrays are likely to be expressed namely good strongly predicted promoters although the leader region and the Casman recognition motif are different from those in the host suggesting that integration here might not be possible okay and of course we asked the question there are these interesting mini raisin bacteriophages on what might be the spacers and here we came up with a big surprise namely that the spacers in this phage mini arrays target none other than the terminus genes in the same bacteria phages except not really except they don't target the terminus genes in the same genome literally but the Germany is geo no because there are several mismatches that in all likelihood precluded preclude targeting but there is a perfect match of to the terminus genes of closely related bacteria phages and this is a repeated pattern that we observe in a variety of these bacteria phages there are several other examples so what you think happens here is phage Wars apparently they defame recruits the hosts crisper cast machinery to express these the spacer of the crispr RNA from these mini arrays and preclude the super infection by the related bacteria phages remains to be tested experimentally I think some experiments are going on in Sylvain or no on a slab so the general conclusions I think we fairly well understand understand the basic CRISPR mechanisms and the most common CRISPR of variants are known although many new interesting ones for sure remain to be discovered I'm not saying they have entered the phase of diminishing return of the in this basic CRISPR research but just perchance we may not be that far for them however I think we are only starting to stretch the proverbial surface of the border CRISPR biology particularly connections with diverse microbial signal turn signal transduction systems there and also there are many ecological so to speak all of CRISPR there are many ways the CRISPR are used in the arms race by the host and by the viruses all this remains to be explored I think for years to come so I think I have to thank people who participated in the world evolutionary genomics group on at the NCBI the main CRISPR go Kirra macoraba Sergey shmakov who discovered who didn't discover several new types and subtypes of graceful and designed the computational pipeline for Chris pre CT search URI wolf who is not marked for some reason but he is here and who contributed a lot of thinking and a lot of computation wisdom for who explore the ironies and of course the collaborators they can show much work done together the fun but without collaboration de France lab none of us none of these would really make much sense Constantine Sabrina from skolkovo introduced repeaters we've come American tn7 transpose ins and of course last but absolutely not least the arbor biotechnology company in particular or Winston young David Skorton Dave Chang with whom we enjoy the vibrant ongoing collaboration thank you very much for your attention [Applause] regarding the phage wars one can imagine that off target activity could target the own phage genome so would you expect that in this crisp focus on this CRISPR system the off target of activity is very low biological biological reasoning or clearly suggests that and if you could notice there are quite a few mismatches if I recall correctly actually actually these are the same systems that have been studied in the classic 2007 paper by Bangu and coworkers and if you may recall the paper which kinda started the whole thing single mismatch there eliminated the protective effect these are those very same systems so indeed I suspect that there is practically no the the and their Bell protected there are several mister suspect there is practically no of target hi what about the CRISPR array that was found like three years ago in a in a giant virus is it functional okay I should have a rule that you owe me a dollar if you ask me that question because so many people do there are no crisper trays and giant viruses there are no crisper is there only red caring so so this so good maybe vir or maybe their area whatever it is it's kinda interesting locals in in many viruses it may it contains repeats which sort of trigger the thinking about the connection to CRISPR and it may be involved potentially in defense but if so it remains to be studied the experiments are inconclusive but if there is some defense function in certainly realized at the protein level it can mechanistically or evolutionarily because nothing zero to do with Chris Picasso please don't talk about CRISPR areas in giant viruses there are no for that matter on you know to bring home general biological point there are no CRISPR systems in eukaryotes or eukaryotic also void in a few minutes I'm ready to talk why but just let's remember this and another question super fast what about these ancillary proteins are there most of them work at level of RNA or DNA that defense or functionality of these these neighbor proteins the accessory proteins that we have been talking most of them or they're different ones most of them do neither most of them do something else such as say membrane connections of their increased persistence at least this is what we suspect because definitely never importance or signal transduction connections such as we were standing in TP aces at least this is what they suspect because they are definitely entity give signal transduction functions some of them are nucleases and those nucleases some of their parties especially some are new places and those nucleuses that we detect very quick so we know the phages are expressing anti CRISPR you know to combat the CRISPR response from a bacteria so now the wages have crispers to attack other similar phages do the phages have now anti CRISPR systems as well oh you mean oh is it the case that the same phage genome would contain and take respect our teens and and take CRISPR crispy right here yeah the answer is I don't really know in experimentally there are no anti Chris proteins have been identified in those phage genomes it is difficult to develop a computational strategy to predict and take response although we are where they are actively working on that and once they are done I will be able to tell greater confidence right now I think more likely these are exclusive but what they don't know for sure okay so Eugene will be around for for the rest of the evening so let's go out and have a drink and you can continue to badger him questions about CRISPR defense thank you everyone you you
Info
Channel: Broad Institute
Views: 10,538
Rating: 4.8418078 out of 5
Keywords: Broad Institute, Broad, Science, Institute, of, MIT, and, Harvard, Genome Engineering Workshop 2018
Id: khwOuh1qG0g
Channel Id: undefined
Length: 160min 50sec (9650 seconds)
Published: Fri Jun 01 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.