Gene Instabilities/Accelerated Regions in the Human Genome

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this UCSD is a presentation of university of california television for educational and non-commercial use only now by way of introduction as you probably know we humans shared a common ancestor of the orangutan about 13 million years ago with the gorillas about 8 million years ago with the chimpanzee and bonobo about 6 million years ago and with the closest extinct evolution cousin the Neanderthal about half a million years ago and you'll be hearing more about a lot of this along the way now I realized there's an audience that ranges all the way from really hardcore geneticists that know a lot more than I do all the way to educated laypeople who are interested in the topic and so therefore I hope the experts will bear with me while I give a sort of a general very brief introduction to genetics one of the problems we face in Carta is that we have so many specialties and so many complexities and so much jargon and that even amongst ourselves we have trouble keeping up with all the all the jargon so I'm gonna just give a very brief introduction and apologies to anyone who feels that it's too simplistic as you know there's DNA in the cell boat and the mitochondria and the nucleus they're mainly talking about the nucleus today chromosomes and where the DNA is packaged up and by histones and if you look at the level of molecular detail you can see the double helix with these base pairs that keep the DNA together and which constitute the genetic code and you hear phrase like 5 prime to 3 prime and it really tells you which strand in which direction you're running in now there is this reductionist view of biology that DNA makes RNA makes protein and there's a tendency to therefore think that cells tissues and organize emanate from the simple paradigm but that's like saying that if you have Betty Crocker's cookbook you have a meal like that it's a lot of other things that happen along the way obviously so a more complete view of biology would be that you also need lipids you need glycans you need these to come together and glycoproteins for the glycans glycolipids cells matrices tissues and organisms of course things feed back to DNA and RNA but don't forget the microbes and parasites the physical environment the in the case of species like humans the cultural environment so this is a more complete view of biology but today we are going to be a bit reductionist we are going to focus mostly on DNA RNA in proteins but keep the bigger picture in mind and occasionally refer to the bigger picture as you know you each of you have chromosomes derived from you from your parents and if you're male you have a y Y and sort of two x's there's a mitochondrial DNA which is in all cells and during sexual reproduction you get unit parent or clonal inheritance of Y and mitochondrial DNA and recombination of other chromosomes can take place so some very basic terminology a locus on a chromosome in a genome the genome would be all of the sequence in in your in your genome so you can have a genetic locus which can they can have multiple loci over here this term you can have alleles of the same gene in which you have alternate forms found on the same place in the same chromosome you can have haplotypes which are combinations of alleles at multiple loci that are transmitted together on the same chromosome and from a genetic locus very often you'll find a gene but the gene is broken up into these exons in terms of the coding region for amino acids and you'll hear about enhancers and promoters that affect the gene and you translate this DNA into RNA in a primary transcript but then you have to take this messenger RNA and you'll hear about the five prime untranslated region and the three prime untranslated region at either end of these genes and this messenger RNA has to undergo splicing to develop a process transcript and give you a protein but the big new elephant in the room over the last few years is the fact that a lot of our RNAs are not coding and are doing a lot of very interesting things very important biologically you'll hear about that so with that very brief overview obviously left out a lot of terminology but just a few few words to keep us in thinking along the lines of these things we're gonna have the genetics of humaneness and one of the things when I decided is to make the program sort of in this direction generally we're starting at the big picture level with entire genomes and then we'll work our way through segments of genomes then you start hearing more about RNA and accelerated regions genome gene regulation networks and eventually drill down to a few examples of a single couple of single genes and not shown in this is that the in the final closing remarks Pascale garner will put things a little bit into perspective by looking at the even bigger picture the next speakers have an eye cliff the University of Washington speaking about segmental duplications and deletions so thanks it's a real pleasure to be here thanks Ajit and and Alain for organizing such an interesting and I think still topical area of discussion probably will be topical for the next 50 years but my perspective on this has really been from a slightly different point of view well we've talked a lot and we'll hear more about single nucleotide variation roughly the last 15 years my lab has been interested in larger forms of genetic variation deletions duplications and inversions and people be particularly down to the level of about one kilobase in size and more recently with the advent of new technologies we've been able to push this down to even further down to about 50 base pairs in terms of its size my initial interest in maggots passion into this field comes from really the study of historical variation historical copy number changes particularly duplications and it comes from really two perspectives and I'm summarizing the work of about 60 years or 70 years of others here but I think they're they really summarize why I think duplications are so powerful in terms of evolutionary process the first has to do with functional import when you duplicate sequences and make extra copies of them by definition you free that sequence from evolutionary constraint so if there's genes within those areas they can actually evolve eventually new functions can evolve from that and in fact duplication is the primary force by which new genes evolve within any species whether you're a cricket a snail or a human this is the primary mechanism the second is really with respect to the structure of the genome when you create additional copies of duplicated sequences you now predispose that genome in that area of the genome more precisely to additional rounds of deletion and duplication this was recognized by guys like HJ Muller and Sturdevant long before we even knew what DNA was they observed a bar Lucas and Ursula for example as being a site of genetic instability that was related to duplication so throughout this talk I'm going to touch them both of these both the role that duplicated sequence particularly within great apes and humans terms the emergence of new genes and is also as roles and their role in terms of genetics ability so a little bit of background the term I will use is in an unfortunate the title of the talk was signal duplications so segmental duplications are nothing more than recent duplicated sequences defined the genomic level has been pieces of DNA greater than the kilobase in size and with greater than 90 percent sequence identity and for the purpose of this talk what you need to understand is that there are differences in terms of the distribution of these sequences so sometimes they can be distributed within a chromosome or they can be distributed between different chromosomes in which case we refer to them as intra versus inter chromosomal duplications and the other important piece of information is that they can be different in terms of their configuration they can be separated from their ancestral sequence by large distances in which case we call that interspersed or they can be side-by-side in which case we call that tandem alright so that's the background so what does the human genome look like with respect to this property this is a slide that I often show but I think it actually summarizes quite nicely the pattern of large segmental duplications within our genome I'm looking here at only the largest so greater than 20,000 base pairs and the most identical so greater than 95 percent sequence identical so these are all evolutionarily quite young and every little line that you see here represents essentially a duplicated sequence that's duplicated intrachromosomal so the blue represent intrachromosomal duplications so two things I want you to get from this the first is not every chromosome has been treated the same with respect to duplications look at chromosome 7 chromosome 16 17 they've been bombarded by a large fraction of duplications what other chromosomes have been nosed essentially quiet the other thing which you maybe can't get from this is it that essentially a large fraction of the duplications are interspersed let's say they're not side by side with their ancestral sequences but they're spread long distances from their kind of ancestral homeland now I'm going to show you the inter chromosomal pattern on top of this and this is the pattern when you add to the exchange of information between different chromosomes that has occurred historically this might look like complete chaos but I assure you it's not there is actually an organization to this but you'll see that specifically biases near centromere is represented here by we're near the ends of the chromosomes are particularly prone to this process so how does this compare to other organisms so this is probably the best sequenced genome other than human that's available and that's the mouse genome and more specifically a specific strain that we call c57 black 6 and this is the pattern for duplications recent duplications at the exact same sequence Abinadi an exact same size so what's not impressive is the difference in terms of proportion most genome is about the same proportion of recent duplications although the sequences are completely different because they're all evolutionary evolutionarily quite young but one thing you should see from this is that the pattern is very different from human you see essentially very few inter chromosomal duplications but more importantly you see that essentially the duplications that are in truck chromosomal shown here in blue or right on top of one another so in the mouse most of the duplications that have evolved have evolved to be in tandem or very near to be in tandem well in humans and in great apes the vast majority of the duplications essentially interspersed so which is the mammalian archetype so I won't show you the data we've had the opportunity to look at the genomes of elephants dogs rats platypus and other marsupials for example and I can tell you that an elephant genome architecture with respect to duplications looks a lot more like a mouse than it does a human or a great ape so this in this regard humans and great apes seem to be the odd man out so to speak or the odd hominids out with respect to duplication architecture so what about the duplication architecture and humans and great apes well in terms of the timing of these events we've had the opportunity to look at chimpanzee orangutan and actually other species more recently we've sequenced our own gorilla genome to actually get an estimate on the timing of the different duplications and this Venn diagram just simply shows you the most recent duplications and how they relate to one another in terms of whether they're shared a the intersect between a human and chimp or if they're specific to one lineage versus another so this is the total number of millions of base pairs that are involved so two things that you can get from this just looking at this map of duplications which is going back to about 25 million years ago you can see that the orangutan circle is significantly smaller than that of human or chimp and this is actually something that we've been able to valid experimentally and a number of different genome assemblies so less duplication in the orangutan lineage compared to chimp or human the other thing that you'll see is that the size of the circle between human and chimp that shared is actually quite large and in fact if you go ahead and you try to estimate how many mega bases have been added by duplication to the hominid lineage of evolution this is the kind of tree that we get now we've added gorillas in addition to a macaque on this the top line refers to the absolute number of the big the big number here is the absolute number of mega millions of base pairs that have been added and a smaller number refers to the number of millions of base pairs added per million years and so the one thing I want you to see from this is in particular right here right around the time of the separation or just before the timing of separation of human chimp and gorillas and we think shortly thereafter there was a burst of duplication activity so we have on the order of three times the number of mega bases that were duplicated in this ancestral branch before the separation of chimp gorillas and human this is interesting because if you look at you know texts that have been written for the last 20 years on this subject most people have been have had this paradox is why is there so few genetic changes over the millions of years that exist between humans and chimps and gorillas and that is because most of those studies have been done on reading on single base pair changes and small insertion deletions in terms of duplication it appears we have an episodic burst of activity at a critical time I would argue during evolution so another piece of information I want to share with you and it's actually kind of a model at this point is how these duplications have grown in complexity specifically within the great ape human lineage and I'm not going to show you what all the data underlying this but essentially reconstructing the evolutionary history of these duplications suggests that they've grown in series so particular sequences which we have referred to often as core duplications or cord to pecans have actually moved and jumped to a new location duplicate duplicating a copy of the old and a new place and then as this has jumped again and subsequent rounds of duplication it has actually picked up the flanking sequence around it and created now a larger more calm flex pattern of duplication this is subsequently duplicated again now picking up unique sequences at the flank once again and so evolutionarily when you look at these architectures you'll know a couple of features number one they're not pure there's just not one sequence but they're actually mosaics of different pieces of DNA they've been stitched together in a second you'll see is that as you get toward the edge the edges become much younger than the center portion which is the oldest which is the core and there's also exchanges of information that have gone on between these creating a very complex architecture that exists in our species compared to other primates so give you kind of an idea of how complex this can get this is chromosome 16 showing you a schematic of about 15 locations that are all about 50 million years old or younger in terms of their origin so if I could go back to my time machine none of these locations would exist with this kind of complexity so each little line here each of each little colored bar represents a different piece of DNA that has been duplicated to this specific location but it's occurred in series if you go back and you look at the disk chromosome chromosome 16 and compare it to let's say a baboon or macaque or any other old world monkey species what you find is that all of these pieces of DNA exists as a single copy in this particular genome so if this represents the archetype of humans and great apes we went from this architecture to this architecture in a span of about 25 million years or less so adding essentially about 10% additional euchromatin or additional chromosome sequence to chromosome 16 in a very short period of evolutionary time really unprecedented from most most studies of evolution similarly if we look at our cousin's the orangutan we can see similar patterns where duplications have burst onto the scene not as not as complex nor as prolific as what we've seen but you'll notice that all these sequence colors here are different with respect to what we see in human except for the one which is the core sequence which is actually jumped to completely independent locations on different chromosomes during the course of evolution alright so why do I show you all this well this architecture which is having all these duplications that are spread out that are large and highly identical creates problems our genome and the reason it creates problems is exactly what it has already mentioned is meiosis is fundamental to essentially the recombination process that leads to our chromosomes being different and the way that that process works how it knows to find a mom and dad's chromosome is by sequence homology so when you have big chunks of sequence that are virtually identical you can actually trick that recombination process so you initiate a recombination of air where you shouldn't and so here are two chromosomes of four that are misaligning during meiosis and now this big chunk of duplication that's actually separating at two different locations misaligns during meiosis and now you create gametes that have additional copies of that duplicated sequence or have lost copies of that duplicated sequence but more importantly because they're interspersed everything that's bracketed by these duplications gets taken along for the ride and now you have gametes that now have additional copies of genes a B and C in addition to duplication or have lost so this creates huge amount of genetic diversity in our population and as a result it also creates disease because when you actually sometimes remove entire swaths of five million base pairs that contain six genes and I only have one copy it's not sufficient to actually properly develop and so years ago we identified these hundred and thirty regions and we systematically have shown by studies of human disease that about forty five of them are important in terms of causing sporadic mutations as well as inherited mutations as a result of this process these effect diseases such as autism intellectual disability developmental delay epilepsy in schizophrenia so our genome is predisposed to these diseases because of our architecture essentially that is evolved and these are just a few of the micro deletions of the forty-five that we've characterized this is an example of a very common the second most common cause genetic thought of autism here's a autism spectrum disorder a deletion of this particular segment here's another segment in chromosome 15 which is unstable associated with schizophrenia as well as epilepsy in the general population probably one of the most common causes of general generalized epilepsy in the human species so this is just going back to that chromosome 16 view this is the beautiful located architecture that is evolved on those chromosomes and this is just to show you that rearrangements between these big blocks result in this form of mental retardation schizophrenia rearrangements between blocks 9 and 10 result in syndromic intellectual disability and this is the one of the most common causes of autism I already mentioned that's rearrangements between blocks 12 and 13 so all of this architecture has evolved specifically on our species so the question that some of you might be asking I know we've asked this for the last 10 years is essentially why why would we possibly have this type of architecture if it's actually predisposing our population to increased burden and actually increasing our susceptibility to diseases such as autism schizophrenia and intellectual disability and the answer I think partly lies in this is that in these course sequences it's not just generic sequence but embedded there in our rapidly evolving genes and gene families and so our group along with about four other groups has characterized these gene families over the last few years and these are just a subset of the genes of there's about six core duplic ons that have really carried very rapidly evolving gene families and so shown here are some of the genes is the first one that we published in 2001 this is a gene that shows extreme positive selection it actually evolves about 50 times faster than a jean under purifying selection that is to say the additional duplicate copies that are being created are extremely diverse from the kind of the ancestral sequence from which they came here's another gene this is a fusion gene called tray 2 it's actually been a marker for bladder cancer for some years but interesting is thought that it regulates of cellular growth specifically in rapidly dividing cells here's another gene described by Pierre Bork it's a ran gp2 binding protein ancestral progenitor is a nuclear pore and this one described by gymnast akela in 2006 as a gene has been loosely described as essentially a neural uncle gene expressed in neurons as well as in again rapidly dividing particularly a cancer cells what's cool about these genes is that they have no orthologs really clear orthologs in the mouse if you find him they often don't produce a transcript that's gonna be functional they have multiple copies in chimpanzee and human they have dramatic changes in their expression file profile so if you go back to an old world monkey species you'll see them expressed in totally different tissues and about half of these are showing extreme signals of positive selection but none of these have been characterized in terms of their phenotype so there's a big question mark for the function of these genes largely because they're repetitive in nature and most of the methods that we've developed are designed essentially to characterize unique sequence and unique genes so this is just to show you an example of one of these this is one that we've worked on for the last ten years this is a gene called nuclear pore interacting protein humans have 20 copies and some humans have 15 chimps have 30 copies but about a third of them are completely different in terms of their location what we've been able to do is take the copy that was let's say a single copy in a baboon put them back into a mouse in the back end making a transgenic do the same with human copies and what we found misses unpublished but what we found is essentially that every copy that we've taken back into a mouse from the human shows expression patterns that are very specific within specific cells so what I'm showing you is a part of the human brain this is the dentate gyrus and this is a little the staining that you see here is RNA staining and what we can see is there's a high level expression of this particular gene family in a neurons but more importantly in areas of active neurogenesis so dentate gyrus the cerebellar granular layer of the human brain if you go back and we do the same experiment with a baboon there's no expression essentially zero expression in either these two specific areas so stay tuned I think these genes are going to be much more important in terms of understanding human function the other thing that we can do now and this is in part because we actually have so many human genomes to compare we can start asking specific questions not about shared duplications between human chimpanzee and gorilla that may be different in terms of copy but start asking questions about what genes are actually duplicated in the human lineage and of those that are duplicated specifically in the human lineages which of them have become fixed so I'm showing you here data from a paper we published end of last year where this is analyzing 155 human genomes for copy number of these duplicated genes and this is the copy of that of a given gene shown in Asian populations European African as defined by the genomes that were analyzed in that study compared to in gray essentially chimpanzee orangutans and gorillas we also had the ability to analyze the Neanderthal as an example shown here in brown but the important point I want to make here is because we now analyzed 155 human genomes we can clearly say that these specific genes are duplicated and duplicated only in our lineage and so these are some of the genes that pop up I'll expect you to be able to read many of these but just to give you that kind of a flavor of this this is a gene called G tf2 IRD to the lesions of this gene have been thought to be important in terms of visual spatial processing in the brain this gene G pran - there's a gene important in terms of glutamate induced to neurite growth in the brain sure enough am7 a syndicate in amide acetylcholine receptor fusion gene that's specifically duplicated and human Hyden is important in terms of fluid transport across the brain so it's actually a structural protein and mutations in this right result in essentially hydrocephalus and SMN for example is a gene important in terms of motor neuron maturation and so the important point here and this is actually statistical but pretty much any analysis that you look at the types of genes that we see specifically duplicated in the human lineage are disproportionately important in terms of brain development and this is my favorite my last slide example it turns out we have an expert in the audience on this specific gene and actually here at San Diego a gene known as sir gap to sir gap 2 is a slit ro GT pas it's a gene gene these genes have been known to be important in terms of brain development in every mammal for for many years but Sir get to functions primarily to control migration of neurons and dendrites for formation in the court cortex okay and so it's showing here's a little pictorial taken from Frank polos paper or editorial of it it simply shows that the actual expression level of this particular gene is critical for telling how far a neuron migrates from a ventricular zone up to the cortical plate so it's kind of a Goldilocks thing got to be not too much not too little and you go to the right spot and you begin to actually form dendrites so here's the cool part turns out that there is a gene this particular gene is an example of one that's been duplicated specifically in the human lineage we've now estimated that it's two to three million years old the duplications are large they're not represented in the human genome assembly that's because they're large and highly identical so if you look at the human genome assembly you wouldn't see these duplicate copies and what's interesting about these jeans these duplicate copies is that they're expressed and they're expressed in fetal development and some data that's emerging may suggest that these could be important in terms of acting is like an antagonist against sir GAAP to helping to more finely regulate when and where sir gap actually exerts exerts its function so an antagonist of the parental copy so in summary I've talked about a unique feature of the human genome architecture or a human great ape genome architecture which is this interspersion of duplications I talked to how that architecture is kind of bad karma for us predisposing our genome to really a burden of large copy number variation associated with neurologic and neurobehavioral and neurocognitive disease I've talked about how that architecture has evolved recently with a focal point on specific segments of DNA that have kind of marched across chromosomes creating this architecture and I've talked about how those pieces of DNA that are so prolific are associated with essentially genes of unknown function but genes in which I think are tantalizing in terms of their signatures of selection and evolution and more maybe more interestingly even the fact that they're flanked by human specific genes which we know are disproportionately involved in brain function so with that I will end I just want to acknowledge these two folks here Tomas Marquez and Zoe Jiang and I actually should also acknowledge Matt Johnson and whose work I largely presented today and then obviously great collaborators clinically and more importantly I guess I should emphasize the fact that there are ability to actually study these difficult regions of the genome requires that we actually have genome centers that are still dedicated to excellence in terms of the quality of the sequence that's being generated and despite the fanfare of next generation sequencing saying that we can sequence thousands of genomes we haven't still sequenced the first human genome completely yet to understand the true diversity and complexity of our species thanks so it's a pleasure to welcome Katie Pollard from UC San Francisco to the podium her topic is human accelerated regions in the genome great so thanks Elaine and Nagi for the invitation it's great to be here you're really exciting first part of the session and I if I understand right my job is to move us from the genomes view into the gene view and I'm certainly going to start with genomes and and hopefully provide a little bit of a transition into a more focused look at specific genes and parts of the genome that have played roles in human evolution so this question we're thinking about today about what makes us human isn't a new one humans have been comparing ourselves to other animals especially our closest living relatives for eons most likely and many of us in this room belonged to disciplines that arose out of this intellectual curiosity in anthropology for example approaches focus on fossils archeology behavior and biology of living primates and including ourselves but the researchers in the symposium today are actually in a relatively new field compared to these approaches we address similar questions but we're using DNA sequence data as we've heard already in this session first session today and the way that my lab specifically explores the genetic basis for human assists that we want to try to pinpoint the parts of the human genome that are most different our DNA that's most different between humans and chimps or other primates and then to try to link these to human specific biology so this is a question you're hearing coming up over and over again in the talks today so as we've heard many of our traits are shared humans really are just great apes and but there are some ways that we're different and these span all aspects of our biology from disease susceptibility which I'm particularly interested in to behavior and diet I was part of the international consortium that performed the initial sequencing and analysis of the chimpanzee genome and this seems like a long time ago now back in 2005 we now have genomes of many other vertebrates and including a number of primates and as we've heard today very excitingly some even some extinct primates Neanderthals as we heard the dentist our closest relatives as far as we know that have ever lived chimpanzees have the role of being their closest living relative that hasn't gone extinct and this is important for a couple of reasons one is that we can get high-quality DNA samples we don't have to worry about these issues of contamination and of DNA degradation but really much more importantly than that is that we can observe chimpanzees living chimpanzees today so we can in trying to make the link from the genome to the traits that were interested in we actually have something we can observe we can observe the soft tissues we can observe things like behavior and so both these extinct hominids that are very closely related to us and our closest living relative the chimp play very important roles in asking what makes us human from a genetic perspective and they're very complimentary so interestingly that should getting back to the chimp genome project consistent with the fact that most of our biology really isn't that different from a chimp or a gorilla orangutan as I'm sure many of you have heard our DNA sequence is not that different so we differ about one in every hundred base pairs one in every hundred letters in our genome from the chimp genome and if we focus on the parts of our genome and this is only about 2% a little less than two percent of our genome the parts that encode proteins that there are even fewer differences than that so our proteins it turns out are nearly identical and there are some proteins that are very different and there's some pretty exciting stories about those that we're going to hear about the rest of this afternoon but what came to me from looking at the chimpanzee genome and from understanding the findings the chimpanzee genome project was that we need to look beyond the proteins if we want to understand the genetic basis for humaneness the story isn't going to only be there so the other thing that we've heard about from Evan and a little bit from Elaine today is that there are these structural variations parts of our genome where we have a sequence and chimp doesn't have it at all or we have multiple copies with slight variations in them and chimp has only one or two copies and these some of these are unique to us and some are unique to chimpanzees since our common ancestor it turns out that on a base pair by base pair level these actually make up more of the difference between a human and chimps so if you've heard that figure cited that were 99% identical to a chimpanzee those are single letters of DNA we're looking at the corresponding letter between human and chimp you're sure you're looking at the same place in the genome and there's been a change it's important to remember that these structural variations although not as talked about as much not as well understood not it's easily mathematically modelled I'll actually play a very important role as well so I'm gonna touch on both of those a little bit today focus a little bit more on substitutions because we have nice mathematical and probabilistic models for understanding them but I think these structural variations are exceedingly important and are increasingly coming to light so both of types of differences between humans and chimps or humans and other non-human primates are important and as I mentioned they affect both the protein coding sequences in our genome and the non-coding parts of our genome and in fact most of the chimp human differences aren't in proteins they're gonna be in these parts of the genome that used to be called junk DNA and it turns out that some of it is junk in the sense that it's not doing a lot to help us in our biology along but much of it is doing important things and so slowly science is starting to understand this non-coding or dark matter part of our genome that used to be called junk and one of the important things that the non-coding genome does is to control expression of nearby genes so that things called regulatory elements and they can turn nearby genes on and off you can think of the genes like the building blocks or the bricks and then these are the ways that you can put them together so chimps and humans have basically the same building blocks with some interesting exceptions but what we're interested in pursuing is the idea that that you can put them together in different ways so this is exciting it's a new area to focus on since much of science has focused on proteins in the past but it's also very challenging because compared to proteins where we know a lot about their structure and their function through years and years of biochemistry molecular biology structural biology very little is known about the non-coding genome but luckily we can let evolution help us with this problem and the reason is that if a sequence is doing something in the genome it's doing something important for your biology or a chimps biology then it is disadvantageous to change that sequence you might alter the function and in extreme cases lead to a disease or some other condition that's not as favorable so it's best not to tinker around with things if they're working and following that sort of paradigm what we can do now in in 2011 is to take all these vertebrate genomes there's about 50 that have been sequenced today and the more distantly related ones things like a mouse or a chicken or a fish are exceedingly helpful for understanding these regulatory sequences this non-protein part of the genome and that's because if a piece of DNA that's a candidate junk DNA it's just out there we're not sure what it's doing is actually playing an important role like turning on a nearby gene during development that helps you start to make cardiac myocytes then it would be a bad idea evolutionarily speaking to tinker around with that sequence and therefore what happens is that the human version if we compare it to Mouse chicken or fish is actually not that different it's much more similar than you would expect by chance given the hundreds of millions of years of evolution that separate these species and so what we find in comparing more distantly related vertebrates back to humans is that at least five percent probably more like ten percent maybe even more than that of our genome is very slow evolving it's what we call under negative selection or functional constraint and since we know that less than two percent of the genome is protein coding that means that most of what's important in our genome actually isn't the proteins it's these regulatory or non-coding sequences now this is exceedingly important and one of the most important things that's come out of comparative genomics and sequencing different genomes is that by looking at these species that are more distantly related we can actually shed light on and functionally annotate parts of our genome and understand which ones are more important than others in terms of our biology and our health then if we look to a close relative like the Neanderthal the chimpanzee where most of the genome isn't different most of it's the same the story's the opposite we want to look for the parts that are different so they're the genome is nearly identical and what's interesting are the places where there's structural variations or substitutions at single DNA bases and by linking these two pieces of information together we can figure out which of these differences are falling in these elements that are important for gene regulation and therefore for development and normal functioning and health so to look at this at the level of DNA sequence data I just want to show some quick examples these are DNA sequence alignments there's one row for each species it's human chimp mouse and rat in this example but as I mentioned we can line up about 50 vertebrate species now a column represents a place in the genome where we assume that those DNA bases all descended from a common position in the common and the ancestor of these four species and we can look across this alignment and look for differences so if we compare human and chimp this is about 40 base pairs long there's one difference and since I told you there was one in every hundred base pairs across the genome this is about what you would expect by chance you'd expect zero or maybe one in a sequence of this length well it turns out this is just a random place in the genome that I grabbed it intentionally it probably is junk DNA and therefore this represents what would be happening if there wasn't any functional constraint this is the background or what we call a neutral rate of evolution interestingly if you compare mouse and rat there's four differences that might come as a surprise to some of you but it's actually what we would expect because there's more evolutionary time back to the common ancestor of mouse and rat than there is between human and chimp and our common ancestor and so the idea there is that if a piece of DNA isn't doing anything important that it randomly accumulates mutations and that those happen at a fairly constant rate over evolutionary time over millions of years and so the amount of sequence difference tells you something about how long ago two sequences had a common ancestor and Eddie talked a little bit about that today and talking about coalescent times between humans and Neanderthals so the prevalent pattern that we see looking across this alignment is that the two primates are similar to each other and they're different from the two rodents and that's because there's actually quite a lot of time back to the common ancestor of all four species so here's another example I didn't take this sequence randomly I picked it very intentionally it's about the same length a little bit shorter and there's no differences at all between the four species and this suggests that there has been functional straight we can constrain because if you think about a model for DNA sequence evolution that expects things to look like this then the probability of seeing a sequence like this is actually very close to zero it's very unlikely that you would get that little change maybe not between the human and the chimp it's actually not that weird to see no difference and maybe not between the mouse and the rat but you'd certainly expect the primates to look different from the rodents and then other forces can actually increase the rate of substitutions so if there's what we call positive or Darwinian selection operating it's actually advantageous to change the sequence faster than it would under the neutral or background model and you can also have mutational and other processes that increase rates so what my lab does is build statistical probabilistic models for how DNA evolves using these principles I just described and then we use those to search through vertebrate genomes for parts of the genome that are doing unusual or interesting things so the pattern that I want to emphasize today is looking for something called a human accelerated region and I already introduced this concept the idea is that the sequence is evolving differently in one part of the phylogenetic tree or in one set of species compared to the others and in particular we want human to be different and everybody else to be the same so here's an example of a sequence or chimp is identical to mouse and the rat genome but there are six positions where human is different from chimp which I mark with the little green arrows there and this is highly unlikely to occur we expect human and chimp to be similar to each other and we expect chimp to be kind of different from the mouse and the rat so there's two important things about this sequence one is that the chimp is more similar to the mouse and the rat than you would expect that tells me this sequence is probably doing something important and secondly human is different from chimp I wouldn't expect that that suggests that either that functions been lost or altered in some way potentially in the human genome so we use those two concepts things that are highly similar across the mammals but different between human and chimp we take these mathematical models that I described we perform a statistical test called a likelihood ratio test we have to be very careful about how we implement this on computers these calculations are very intensive and the genome is huge so if I were to perform this on a desktop or a laptop computer it would take about 35 years but using a computer cluster at ucsf which we have which has about a thousand computer nodes stacked up and running in parallel we can actually do that analysis in an afternoon so a lot of what you're hearing today from me and from other people is only enabled by these new advances in DNA sequencing technology but also in computing computing plays a huge role in an AV enabling these analyses so what have we found in 2006 we published about 200 of these human accelerated regions we call them Harz for an abbreviation now using 50 vertebrate genomes and some improvements and techniques we've almost tripled that number of elements and these tend to be fairly short about a hundred and forty base pairs on average in length and as I alluded to earlier in the talk and we might have expected from our thousand foot view of the human and the chimp genomes they're mostly not in proteins a large percentage or energetic meaning they're lying out in between genes and if they're in a gene region there in those intron sequences that aren't the coding parts or the utr so this is exciting we're pursuing the hypothesis that many of these probably are regulatory elements that control expression of a nearby gene so to get a handle on what role they might have played in human evolution it's interesting to see what those genes that have a higher nearby are doing excitingly many of the genes that are have a higher nearby are themselves transcription factors now transcription factors are proteins that go and turn on and off other genes and so that's really interesting because you could change a few base pairs in the human genome you could change a sequence that alters the expression of a transcription factor you could make more or less of the transcription factor in a particular cell type at or a particular time during development then that transcription factor goes and turns on and off a whole bunch of other genes you could imagine having a pretty major impact on an actual trait like something like the size of a brain or how many chambers you have in your heart or how well you can metabolize starch so there's a lot of things that have to happen and to go from a genome sequence to a trait or something that we can really latch on to and say yes that's different in a set of species or a species that I'm interested in like human but transcription factors are a powerful way to make a big change like that so this is exciting that many transcription factors have Harz in fact many of the genes transcription factor or otherwise are expressed during development which means that they could play roles and things like how much hair you have how long your bones are the shapes and complexity of your different tissues and many of the genes with a har nearby our disease genes more than half of them showing that they're really important genes and that when you do make changes in them that they do have impacts on biology and health and I don't have time to go into it in great detail but we've already heard today how important segmental duplications are in terms of duplicating genes and deleting genes in a genome and it turns out genes that are involved in these rearrangements these structural variations also are enriched for these human accelerated regions I want to show you a few examples of specific genes where we're just starting to think about the biology that might have followed from having a human accelerated region nearby and I hand-picked these because they are genes that we know play roles in thinking processes developmental or otherwise that are different between humans and chimps and so first example is the Fox p2 gene it's sometimes called a speech gene because when you have a loss in function of the fox Pichu gene in humans the human can do all the normal cognitive function of language can perform sign language in the same way that a chimpanzee can but can't vocalize and the Fox b2 is involved in modulating neural circuits and also controlling fine muscle movements which are very important in the face especially in terms of being able to do spoken language the sonic hedgehog genes several of the Hox genes and several of the fibroblast growth factors all have Harz human accelerated regions nearby these genes are all play really crucial roles and the basic patterning and layout of the embryo in a variety of different parts from the brain to the limbs to a basic cell proliferation another exciting example is chorionic gonadotropin this is the gene that comes on early in pregnancy it's essential for normal implantation and maintenance of a pregnancy and it's interesting that this gene came up as having a human accelerated region nearby because there are it's already been demonstrated that the protein coding sequence has changed between humans and non-human primates and it also looks as though the gene expression has changed in some specific ways and so we may be getting close to figuring out what actual genetic changes are responsible for those gene expression changes this is really important because humans actually have a very hard time initiating and maintaining pregnancies this is one of our traits that's maybe not been improved during human evolution or maybe it was necessary for things like our bipedalism in our larger brains to have a different type of pregnancy but if you compare a human to a macaque for example may actually have a very high rate of miscarriage and of failed implantation and so this is another interesting example another sort of tantalizing one is a cluster of three genes that are involved in sexual dimorphism and also harbor a human accelerated region that's exciting because at least compared to gorillas humans have much less sexual dimorphism another example i want to share with you is what we call har one human accelerated region one it's numbered one because it was the fastest evolving sequence that I found in this computational scan of the genome it's about 118 base pairs long and there's 18 differences between the human and the chimp genome so that's off scale in terms of how fast evolving it is we would expect about one under that background or neutral model that I told you about so it's an order of magnitude faster than expected so Harwin is a gene meaning that it's DNA is made into RNA but it does not encode a protein instead as we heard in the introductory remarks earlier RNA can actually function on its own it folds on to itself forming a structural molecule shown here on the left and interestingly this RNA gene is expressed and important and very important type of neuron called a Kahal retzius neuron in the developing neocortex so here's the cortical plate and this is something called the sub peeled granular layer and these cells that express Harwin also expressed as shown down here in the lower right a protein called reelin which is absolutely essential for the proper formation of the sixth layer structure that becomes our cortex and so this is exciting it's tantalizing we don't know yet exactly what Harwin does but the hostler and von der Hagen and several other labs are trying to figure out its role in human brain development so as I alluded to many of the human accelerated regions look like their regulatory sequences and one of the big jobs in my lab right now is to try to figure out which ones are and what genes they're regulating this is the underlying model you have a gene that's off you have a sequence nearby which we call an enhancer that can turn a gene on if a transcription factor comes and binds to it and that leads to production eventually of the protein from that gene so here's an example of one part 1:52 we've shown through a bunch of bioinformatic analyses that hard 1:52 harbor is a binding site for a transcription factor called pax6 and it's able to regulate the expression of a gene called neurogenesis important for the development of the neural tube in the central nervous system and through experiments of a type we're going to hear even more about in the next talk from Jim Noonan and we've heard a little bit about already you can make a take a gene that glows blue and a mouse embryo and you can stick the human or the chimp enhancer in front of that gene and you can see where the enhancer functions during development and what we've done is that exact experiment and confirm first of all that heart 152 is an enhancer and that there are differences between the human and chip expression patterns so we're slowly starting to build up a story linking genome to an actual trade or phenotype hopefully there are a number of others that we validated already we're gonna hear I think in the next talk about heart - which is a limb enhancer it's also known as human accelerated non-coding sequence one and my lab is working on heart 34 which is a forebrain expressed enhancer so what have we learned from looking at human accelerated regions I want to besides that it's not all about the brain in this scientific American article that I wrote in 2009 I talked about some interesting sequences that are involved in other parts of our biology such as our diet and nutrition emphasize again our proteins are nearly identical to chimps so to understand what makes us human I think we really need to focus on the non-coding part of the genome and trying to understand better how gene regulation works this is very important and massive field of biology that's really helping our research to move forward and finally a human attrib differ at one and every hundred base pairs but we all different one and about every thousand base pairs so in the same way that the Neanderthal isn't that different from the human we're not that different from each other really either and so with this new technology everyone's been talking about we're actually able and will in coming years have hundreds and thousands of human genomes to compare and these exact same methods will be useful for understanding what parts of certain kinds of people's genomes are different from others understanding traits that make different people in different parts of the world different from each other and most importantly understanding why people at risk for different diseases have different elements in them and I think the the paradigm of focusing on the non-coding genome will be very useful there - we'll find out the genes pretty quickly and then we're gonna have to start this hard work of understanding the regulatory elements so thanks very much I might just before closing to make you aware of the fact that Carter has a website that I very much encourage you to visit and one of the things you will find on the website is the Museum of comparative anthropology which is an attempt to collect all the information will happen we have shamelessly anthropocentric lee all the information we have that points to differences between humans as opposed to the other error our other closely living relatives and MOCA is a publicly accessible site that lists domains 24 of them ranging from Anatomy to social organization and they include genetics and genetics is painfully incomplete at this point but it does list 82 different genes for which we have descriptions of how they differ uniquely in humans and some of the genes talked about today you will find here I thought I'd just give you a very very quick tour starting alphabetically so there is a gene called which codes for a very small sugar difference that defines your blood type and this this gene has a summary authored by Maria Saito here and you'll see font that indicates how sure he is about the statements ranging all the way from true to likely to speculative and this gene is connected to other topics in MOCA these can be in another domain such as biochemistry for example such as milk composition so you can hop from the gene for in the domain genetics to the topic of milk composition as Katie said it's not all about the brain it can also be about the milk which influences the brain and from the topic of milk composition you can find another link to another g bc mah mutation that does the change in silac acids and that gets you back to the genetics domain with it's now 82 entries which of course are totally incomplete after just having heard from thousands of gene expression differences that you have Gila talked about or the segmental duplications we heard about from Evan Eichler but I encourage you to visit the Carter website and to not only inform yourself about genetics but about the next upcoming seminars that you can find there and would like to end by reminding you that you will not find all the differences I mean there's a huge body of work that was published recently on these uniquely human deletions that you will not find there but we have many there I hope that you enjoyed this seminar and you can actually watch all the past Carter public seminars on this site the UCSD TV and I hope you will join us for the next seminars in October December and March and I'll end by thanking our sponsors the malleus Foundation and Annette emerald Smith as well as all the speakers who I thank very much for making a very visible effort of translating genetics for non genetics aficionados thank you very much
Info
Channel: University of California Television (UCTV)
Views: 16,375
Rating: undefined out of 5
Keywords: anthropgoeny, genetics, stem cells, genomes, neanderthal
Id: v3rYFBU9JSs
Channel Id: undefined
Length: 56min 11sec (3371 seconds)
Published: Tue Nov 29 2011
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.