Ancient DNA and the New Science of the Human Past

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Dr. David Reich is like many people at Harvard, does many, many things. So he is a professor in the Department of Genetics at the Harvard Medical School, an investigator of the Howard Hughes Medical Institute, and a senior associate member of the Broad Institute of Harvard and MIT. He received a bachelor's degree in physics from Harvard College and carried out his doctoral work in statistical methods for learning about evolutionary history with applications to gene mapping, and that was at the University of Oxford. His work focuses on studying population mixture with application to both medical and human history. In medical genetics, he's best known for developing and applying methods to use the history of the mixture of populations in the history of African-Americans to find genetic risk factors that contribute to health disparities. He started the first state of the art ancient DNA lab in the United States in 2013, and much of his current research focuses on using the transformative power of ancient DNA to gain new insight about medical and evolutionary genetics. In 2015, in Nature, he was named one of 10 people who matter-- so that's 10 people who matter in all of the sciences for his contribution to transforming ancient DNA data from quote "a niche pursuit to industrial process." Some of his contributions, we cannot possibly mention them all, include proving that interbreeding occurred between Neanderthals and modern humans, discovering a new archaic human population, the Denisovans, reconstructing Indian population history and risk factors for disease, and explaining the elevated risk for prostate cancer in African-Americans. And the work continues. Five years ago, scientists had retrieved just one ancient human genome from the entire Western Hemisphere. And now, the total has reached 229, enabling important discoveries about how humans spread through the Americas thousands of years ago. Why do I mention this in particular? Well, as reported in the New York Times just today, David and his team have published new research in Cell Magazine about new genetic exchanges between North and South America that have significant implications for models of the peopling of the Americas. So you can see it really is ongoing state of the art research. Dr. Reich has published extensively on these topics and received many awards, including the Newcomb Cleveland Prize from the American Association for the Advancement of Science and the Dan David Prize in the archaeological and natural sciences for his computational discovery of intermixing between Neanderthals and Homo sapiens. As I mentioned earlier, he's a member of the Allen Discovery Center for Human brain evolution at Boston Children's Hospital and Harvard Medical School together with doctors Michael Greenberg and Chris Walsh, who some of you may have heard speak a few weeks ago here. And that center aims to bring together brain science with evolutionary genetics to search for the key changes in the genome that endow humans with our unique abilities for language, art, culture, and science. I'm sure you're wishing to get me off the stage, and I can say I'm just about to leave. Tonight, he will discuss his New York Times bestselling book, Who We Are and How We Got Here, Ancient DNA and the New Science of the Human Past. He will also be signing copies of the book on the table to my right after the lecture. So please join me in welcoming Professor David Reich. [APPLAUSE] So thank you all for coming. I spoke here in 2011, the year after the work that we did in collaboration with Svante Paabo's laboratory in Germany on Neanderthal intermixing, and it was the best audience and the best talk I ever gave, so I'm hoping I can give another good talk, and I hope you enjoy this talk. So the theme of this talk is really to give a series of four examples about how genetic data is showing us that things we assumed about our past are wrong, fundamentally wrong, and how quickly, how very quickly, in the last five or 10 years we've really overturned a lot of assumptions about the past with this new field and its potential. So I think if there's one take-home message it's that mixing is in human nature, a profound mixing of very, very different populations, and that no one population is really even could be pure in any sense. So I'm going to begin by talking about the disruptive power of ancient DNA. So my lineage intellectually is as a grandchild, academically, of Luca Cavalli-Sforza, who died a couple of months ago. And in 1960, he made a grand bet. And the bet was that it would be possible to reconstruct the deep past based on analysis of the tremendous diversity that exists in the world today. So there's people speaking 7,000 or so languages around the world-- incredible, incredible, wonderful diversity around the world. And his thought was that by comparing populations around the world to each other and seeing how closely related they were to each other genetically, he could learn how people moved and migrated by seeing who is most closely related to whom. So the data that was available was not very good at the time, but what he hoped was that by using the data that was available, which were protein polymorphisms, like the ABO blood groups that you can measure in blood and seeing who, on average, is most closely related, you can actually reconstruct which populations are most closely related to who and create a tree of human relationships. So in this way, in 1978 initially, and then in this analysis updated to 1993, he and his colleagues drew maps of how people are related to each other around the world. And what this is is a principal component analysis updated to 1993 and redrawn in the book that I just published, which looks at about 100 places where there's-- 100 protein polymorphisms like the ABO blood group or the the resist blood group, where people differ, and based on measuring the frequency of these types in diverse European populations and looking at the primary gradients of genetic variation amongst European populations, he saw that the primary gradient in the data was a Southeast to Northwest gradient, and he interpreted this as evidence of the spread of farming from Europe, which we know, based on the archeology, spread from the far southeast, Turkey and Greece, all the way up to the Northwest after 9,000 years ago. And so his belief was that the gradient variation was driven by the archaeologically documented spread of farming and that that gradient shown here is, in some way, related to the proportion of ancestry from first farmers who then diluted that ancestry as they spread and mixed with local indigenous European hunter-gatherers. But that turns out to be wrong. The truth is more like this. And so it's that, in fact, the gradient of farmer ancestry is almost perpendicular to the one that he drew. And the reason is it's because people moved too much and more than what was expected. It's not the case that the movement of farmers into Europe was the last major movement of people. In fact, there was another mass migration into Europe that occurred after 9,000 or 8,000 years ago. In fact, it occurred more between 5,000 and 4,000 years ago and utterly transformed the ancestry of that subcontinent. And I'm going to tell you that a little later, what the evidence for that is. So now, I'm going to introduce this technology by talking about ancient DNA, which is a powerful new scientific instrument, much like the microscope was several hundred years ago when it was first introduced. And the measure of the power of a new scientific instrument is what happens when you turn it to something that's never been looked at before. So when the microscope was used for the first time in the late 1600s to look at the microscopic world of cells, for example, or very small things, they discovered cells for the first time. People discovered structures that we're just not even envisaged before. And in the same way, ancient DNA, when it's turned to analyze human populations that have never been analyzed before [INAUDIBLE] skeletal remains that are associated with ancient archaeological cultures. Almost everything that we see is a surprise when it's looked at for the first time. So ancient DNA, the way we implement it in our laboratory, begins with a human skeletal remains like this ear bone, which is a common type of skeletal remains we analyzed. In a clean room, where the goal is to protect the remain against the people handling it-- because the people handling it have tens or hundreds of thousands or millions of times more DNA in them than the ancient skeletal remains you're analyzing. So even a little bit of touching, a little bit of contamination will overwhelm the ancient sample you're analyzing. So a large number of measures, such as face masks, and ultraviolet radiation, and cleansing with bleach, and positive air pressure are used to protect the samples. We prepare these ear bones or other skeletal remains. We remove the parts that we think are richest in DNA. Here's a cochlea from the ear bone that is known to be particularly rich in DNA. We turn that into a powder. We release the DNA from the powder in a solution that's designed to remove the protein and the mineral content, and we turn it into a form that can be sequenced in one of these DNA sequencers, which came online about one decade ago, a little bit more than that, and made it possible to literally reduce the amount of sequencing by about a million fold since about 20 years ago. So with this set, this pipeline, we can generate data. So what I did in 2013 was I wanted to focus on trying to study large numbers of ancient human skeletal samples, mostly within the last 10,000 years. And I got my training in ancient DNA from Svante Paabo in Germany, who developed many of the techniques of ancient DNA analysis and led the work to sequence the Neanderthal genome and other archaic genomes that I've gotten the privilege to be involved in as someone who was analyzing the data to learn about population mixture and history. But what they were doing was analyzing one or two or three or four extremely important samples. And it was not studying large numbers of samples. And if we want to study variation and population history, we need to be able to study large numbers of samples. And so in 2015, this very important group, a paper by a group in Denmark was published, which was really the largest possible study that one could conceivably do almost, with this brute-force technology of sequencing DNA from ancient skeletal remains and just brute-force sequencing and in the sequencer. The problem is that when you extract DNA from ancient remain, most of it is not human. So even though the gene sequencing is relatively inexpensive, even though the technology has advanced to the point that we can regularly, successfully get DNA from ancient skeletal remains, most of the material, often 95% of it or 99% of it is microbial from the bacteria, and fungi, and other microorganisms that were around the skeletal remains when the individual died. And as a result, it's prohibitively expensive to study large numbers of samples. And so this study studied about 100 individuals. We lived about 4,000 or 5,000 years ago from Russia and Europe. And the X-axis, the bottom, is the amount of sequencing that was actually done. So about a billion DNA sequences were generated typically for each of the samples at a cost of about $10,000 per sample. So this was a $1 million experiment, and it would be hard to scale it much higher than that. So the approach that I thought that we were going to invest in when we started our ancient DNA laboratory here was to take a trick from medical genomics. And in medical genomics, the approach that many people were beginning to use was to isolate the part of the DNA that was most interesting for analysis. So in medical genomics, you want to study often all the genes in the genome, which is only 2% of the genome. So what you do is you wash your DNA sample over a series of artificially printed out bait sequences, so 50-DNA-letter-long A, C, T's, and, G's sequences that are printed out, using a DNA printer essentially, and put in a liquid. And you can take your DNA sample and wash it over these baits and only sequence what stuck to the baits. And so what is typically done in medical genetics is the baits are all the genes, and the genomes are fragments of them, and you end up sequencing just that part of the genome. So what we decided to do was to sequence not all the genes in the genome, but about a million places in the genome or a little bit more that are known to be informative about human history and biology because people vary at them. And so we developed a cocktail, a liquid cocktail of more than a million positions in the genome that we would wash our DNA samples over, and we would only sequence those parts. So we would only be sequencing human DNA, and that would result in a 10 to 100-fold enrichment, and we would only be sequencing the parts of the DNA that were useful for us, our studies of history, and that would be another major enrichment. So this resulted in 10 to 100 times less sequencing, and it typically produce higher quality data per individual. And so here is a paper we published in that same year, where the cost per sequencing and the amount of sequencing was far, far less, 10 to 100 times, and the quality of the data was much higher. And this has made it possible to sequence many, many samples. So since the laboratory started in 2013, there has been a 100-fold increase in the amount of data produced by our laboratory and others, and the number keeps increasing. And with that much more data, it's possible to ask and answer questions about the past that are simply not possible to answer with much smaller data sets. And I'm going to tell you four examples of that now. OK, so I'm going to begin a little bit more by giving you a little bit more background about how such data reveals about the past. So the type of data we're looking at in cartoon form can be thought of as follows. So in this cartoon, which is a picture that I made for the book to try to explain this with an artist, you start with a cell. So that, on the left, is a cell with 23 pairs of chromosomes. Your DNA is an approximately three-billion long sequence of DNA letters-- adenine, cytosine, guanine, and thymine, that is broken up into 23 pairs of chromosomes, which are the packages on which those letters are contained. And those are shown in the nucleus of the cell that then blows up into a pair of chromosomes, which one of which you get from your mother, and one of which you get from your father. You get one of each chromosome from each. And here is a blow up of a little section where you can see the double helix of DNA, which is comprised of these four DNA letters-- A, C, T, and G. So the way we learn about history from this data is that we study the differences between paternal and maternal and other DNA sequences we have. So the great majority of DNA sequences between your mother's copy and your father's copy that they gave to you are identical. The letters are 99.9% identical where you can line them up, more or less, which is a very high level of similarity. However, in the 0.1% that differ, since the genome is three billion bases long, 1,000th of that is 3 million differences. That's a lot of differences, and we can use that to learn about the history since your mother and father share common ancestors at each place along your DNA. And so what we can do is look at those occasional differences and learn about history. In particular, if you look at places in the genome where there are not very many differences, that tells you that in that place of the genome, your mother and father are quite closely related because there hasn't spent a lot of time for the random miscopying of DNA to accumulate these differences, these mutations, these random errors; whereas in places where there are many differences, that's been a place where your mother and father are typically quite ancient related. The typical time since any two sequences, for example your mothers and fathers, share a common ancestor in humans is one to two million years. But if I'm comparing my sequence to that of my brother's, it might be only one generation old. OK, so how do you use this data to learn about the past? Well, one important thing to realize is that you're actually not one person. You contain within you a multitude of individuals. The reason is is that your DNA, your whole sequence, is packaged on 47 chunks of DNA. You're 23 pairs-- so 23 times two, 46 chromosomes, and your mitochondrial sequence, which is a short bit of DNA you get from your mother. So you start in your current generation with 47 chunks of DNA. And when your mother and father produce an egg or a sperm to produce you, they break the DNA they got in turn from their parents, on average, about 70 times per generation. They splice together their own mothers and fathers chromosomes to send you mixed DNA to their offspring. That's the process called "recombination." So these 47 chunks, thinking back in time, fragment into another 70, so about 118 one generation back, and another 70, 189 two generations back, so you add about 70, 71 chunks, on average, every generation going back in time. And these chunks get diluted into ever larger, exponentially increasing numbers of ancestors-- 2, 4, 8, 16, 32, 64. And so by the time you're 10 or 11 generations back, you have actually fewer DNA chunks than ancestors. There are some ancestors that are not even ancestral to you. They haven't given you any DNA because your DNA is getting homeopathically diluted over larger and larger numbers of ancestors. So what this is telling you is that but for the sections that you do get from your ancestors, each of them gives you a sample of ancestry from that ancestor. So you're not one person; you, in fact, are hundreds or thousands or, if you go back a few tens of thousands of years, tens or hundreds of thousands of ancestors. And so with this data, you can determine exquisitely accurately how closely your genome related is to another genome by averaging over all of these independent combinations. So you might think that you're only one sample, only one flip of a coin, to measure the probability of being heads, but, in fact, you're 10 or 100,000 flips of a coin, and you could obtain exquisite measurements of how closely related two people are to each other. And that's the power of the genome, and that's why it's so much more powerful than what you've probably heard about for several decades, like mitochondrial DNA sequence or Y chromosomes, which is just one section of the genome. So this ancient DNA revolution that's unfolded over just the last eight or 10 years has really been powered by the fact that you're looking not just at one section of the genome anymore, which is what's been happening for several decades and has produced a number of interesting insights about the past, but now we have this multi-dimension, Technicolor-rich quarrying that's possible by sequencing the whole genome, including of ancient samples. OK, so the talk is going to be based around four lessons that, for me, are lessons in humility, which is that the assumptions I came to in looking at each of these problems were wrong, and that's what we see again and again with DNA analysis. So the first lesson in humility is that our ancestry of all people in the world is not all traced back to sub-Saharan Africa 100,000 years ago. So in the '70s, and '80s, and '90s, it became increasingly clear that the great majority of human ancestry comes out of Africa 50,000 to 100,000 years ago, and we still believe that. So there were archaic humans outside of Africa for the last two million years almost. But anatomically, modern humans, people whose skeletons look like ours, with globular brain cases like this Cro-Magnon individual from Europe, have been the longest in the African skeletal record, going back to 200,000 to 300,000 years. So it was thought, based on DNA analysis of present-day people, including in my PhD work in the late 1990s and by Luca Cavalli-Sforza and many others, that all lineages traced back to Africa, that non-African diversity is just a small sample of African diversity because it's an out-of-Africa population. However, Neanderthals, which are archaic humans, who were discovered in the mid-19th century in parts of Europe and who had brains as big as ours and made tools as sophisticated as our own ancestors, primary ancestors did and were distributed in this distribution in Europe and Western Asia, these people we know met modern humans. And so when it became possible to sequence a Neanderthal genome, a question was, as modern humans, who looked liked us skeletally, moved out of Africa 50,000 to 100,000 years ago and moved through Neanderthal territory, did they interbreed with Neanderthals? And if they interbred, did that interbreeding produce descendants who live on today? So the question was, was there interbreeding? And if it occurred, did it occur in Europe where the Neanderthals are densest and most densely distributed? Or did it occur elsewhere, like in the Near East, where there's a more scattershot documentation of Neanderthals and modern humans and not totally clear from the archaeological record, whether they interacted as it is in Europe? And what about the archaic humans who are documented less clearly but still definitely documented further east? So when the Svante Paabo laboratory succeeded in generating a whole genome sequence from the Neanderthals in the late 2000s, my colleague Nick Patterson and I were brought on to try to test whether Neanderthals match some humans more than others. And we developed several tests for interbreeding, and the simplest one I can show you here. So this is a chromosome in cartoon form, so that's a structural element in the chromosome, the centromere, and these are long arms in the chromosome on which hundreds of millions or tens of millions of DNA letters are packaged. If you compare a DNA sequence from a French person and a Nigerian person and look at the 0.1% of the letters that differ-- for example, where the French person has a thymine, a T letter, and the Nigerian person has a guanine, a G letter, we can then ask, does the Neanderthal carry the French type or the Nigerian type? And we did that. And when we did that analysis, we found that Neanderthals carried the French type significantly more often, in this case 92,000 matches to the French and 84,000 matches to the Nigerian. And if it was the case that Neanderthals separated from the common ancestor of Nigerians and French before they separated from each other fully, then they should match each other we can prove equally often, but that's not the case. And this and other lines of evidence made it very clear that Neanderthals, similar to the one that we had sequenced, had interbred into the ancestors of French and the same test when you replace French with any other non-African population showed a similar result. For example, Chinese people also have ancestry from Neanderthals, even a little more. So we tried to estimate the proportion of ancestry from Neanderthals in non-Africans today, and we just repeated the same test where we look at the Nigerian and the non-African person, and then we replaced the non-African person in this analysis with a second Neanderthal sequence. And you ask the question, how much is the way of excess matching is a non-African to when you replace it with a second Neanderthal. And when you look at the answer, it's about 2% of the way, which means that non-Africans today have about 2% of their DNA from Neanderthals, and it's significantly more in East Asians than in Europeans, and we now know why that's the case. I won't explain that in this talk, but I can explain it to you some other time for sure. So the next thing we wanted to do is we wanted to know a date when this interbreeding occurred between Neanderthals and the ancestors of non-Africans. And the way we did this is to develop a statistical technique to estimate date based on fragmentation of the genome that occurs generation by generation as you splice together DNA from your mother and father. OK, so this is, again, representing two chromosomes. For example, red here might be modern human, and green here might be Neanderthal. And a mixed individual will have one entirely red Neanderthal and one entirely green modern human sequence. And when you produce an egg or a sperm, you fragment that DNA one or two times per generation. We know the rate at which that fragmentation occurs. And today, far afterward, you can look at the dice size, the chunk size, of Neanderthal and modern human segments, and that tells you how long it's been since the interbreeding between these two very different groups happened. So what we did is we looked at that typical size measured on the x-axis of fragmentation in Nigerian individuals and the scale at which these fragments that we'd thought we might detect occurred were very tiny, consistent with shared ancestry between Nigerians and Neanderthals-- very, very anciently. However, if you look at non-Africans, the scale is much larger, and that is due to French sharing ancestry with Neanderthals much more recently. And by the size of this and knowing how fast this fragmentation occurs, we were able to estimate a date. We had at particular luck, several years ago, obtaining DNA sequence in 2014 from a 45,000-year-old individual, a non-African from Siberia, and there, they have huge fragments because they haven't-- they were living pretty close in time to the interbreeding event. And because we know the date of that individual from radiocarbon dating and because we can see the big fragments that tell that the mixing occurred 5,000 to 10,000 years before the individual lived, we can estimate quite precisely that the mixture in the common ancestry of non-Africans today occurred in the narrow band between 49,000 and 54,000 years ago. So the conclusions of this set of findings that we had between 2010 and 2014 was that there was Neanderthal interbreeding, about 2% with the ancestors of all non-Africans and that that interbred, that mixed population then spread its ancestry throughout Eurasia, and that's why all Eurasians have about 2% ancestry, with interesting ways in which that's not exactly true that I won't tell you about. So in the same year that we published this work on the Neanderthal genome as we were closing this project, my colleagues in Leipzig obtained DNA sequence, which was completely unexpected from the following site in South Central Siberia from Denisova Cave in the Altai mountains near the border of Mongolia and Kazakhstan. And this individual was-- the DNA was from a finger bone, which was labeled as possibly modern human, and it was 70% human DNA. It was a very special sample that was full of DNA, and they obtained a high-quality genome sequence from this individual. When we analyzed the data, we found that Neanderthals, these are four Neanderthals, were quite closely to each other, the genome sequences we had. But this Denisova sample was very distantly related but more closely related to the Neanderthals than it was to modern humans. So this separation corresponds to time. It's counting number of mutations that occurred, and we could show that this corresponded to a separation of many, many hundreds of thousands of years. So this was not another Neanderthal and not a modern human. It was something else entirely. So we also had a second sample from this population but much poorer quality from a tooth. And so we, of course, played the same game. We asked the question, does this Denisovan genome named after the cave from which it was excavated match some humans more than others? And we obtained a tremendous surprise. When we compared the Denisovan genome to two East or Southeast Asian populations, Chinese and New Guineans, we found that the Denisovan genome match the New Guineans much more often. And this was definitely a real signal. And when we replayed the same analysis that I told you about with Neanderthals, we can estimate that New Guineans and some nearby populations have about 3% to 6% of their DNA derived from Denisovans-- actually quite distant cousins of the Denisovan from Siberia separated by about 300,000 years-- another group altogether, but more closely related to Denisovans than anything else. So this was a shock where, in the case of Neanderthals, we actually had a very well-posed question posed by 150 years of archeology. It was a question of, we know about the new archeology of Neanderthals, these very impressive people who left behind these very impressive archaeological remains. Did they interbreed with modern humans? That was a question that genetics could answer. But here, with the Denisovans, it was reversed. We had no expectation at all. And here, instead of a fossil in search of a genome, it was a genome in search of a fossil. And the question was, who were these people? We still don't know anything about what they looked like or what tools they had. The only DNA we have is from this one cave from Siberia, where are we now have multiple genomes from this cave. Denisovan genomes are widespread east of the Wallace Line-- so in New Guinea, in places that are affected by New Guinean ancestry in Australia and also in the Philippines. And so the conclusions is that there's 2% Neanderthal interbreeding, and then further interbreeding leading to Denisovans. We now know there's a small tiny seven interbreeding event that also contributed to East Asians, a different type of Denisovan ancestry. So just published, not by my group, but by the group in Germany, Svante Paabo's group, they obtained DNA sequence just a few months ago from another individual from Denisova Cave who has a Neanderthal mother and a Denisovan father. So this is a first-generation hybrid of Neanderthals and Denisovans that was actually sequenced. In studies of modern humans from Romania, where an individual about 40,000 years ago, we found an individual that was a mixture within four to six generations of a Neanderthal in modern humans. So we've only studied 10 or 20 individuals from this time period, but what's very clear is that there were many hybrids of very different populations, more different from any pair of human populations today running around the world at that time. So it was a very different population. It shows you that humans, even very distantly related humans, when they encountered each other, were mixing again, and again, and again, and again. So what has happened as a result of these DNA sequences, it's unleashed and opened up a Pandora's box of mixtures between archaic humans that we now know about and have glimpses of from the ancient DNA that were just not expected before. And so this is an example of humility. I think before this work, we thought these were distant groups separated by us from a million years or a very long time with a simple history relative to us. But now, we know they're entwined with our own history, and each other's history, and lineages we didn't even know about altogether that we now have sampled in mixed form in these populations? OK, the second part of my talk, the second lesson in humility comes closer in time. And this is a short section of my talk. And it's about how the idea of white people, "White" people, West Eurasians or Caucasians, is actually quite misconceived in some way. So for several hundred years, people have recognized that there's a large region of homogeneity in terms of how people look and other traits all the way from the Atlantic coast of Europe to the Near East, and Iran, and Central Asia. And it's true genetically. If you actually look at people across this region, the average difference in the frequencies of DNA letters are quite small compared, for example, to the differences between Europeans and East Asians. So in 2016, we obtain DNA from people across this region, West Eurasia, early hunter-gatherers of Europe, who lived before farmers got there in the east and west of Europe, farmers from the Levant in the Near East, and farmers from ancient Iran further east of that. And when we got DNA from all of these groups, what we could do is we can measure the frequency differences between these DNA letters in all of these groups. And we found-- I'm using this color coding to show ancestry from each of these groups-- we found that the average frequency difference between these groups is measured by the average squared frequency difference of variable DNA letters, where some people have an adenine, for example, and some people have a cytosine, was as large as between Europeans and East Asians in all four of these groups. So this was 8,000, 10,000 years ago. And in that region, there were at least four groups as different from each other as Europeans and East Asians are today. So if you were able to go back in a time machine and try to reconstruct the population structure of the world 10,000 years ago, it would not look at all like it does today, with its supposedly relatively homogeneous group that we see today. Instead, it would be broken up into many groups. This is where we know it best. But presumably, it's also that way in East Asia, and South Asia, and parts of Africa, and other parts of the world, too. So how did this region of great heterogeneity, which was not anticipated based on present-day data, get to be the way where it is today? Well, we learned that from studying later people before these hunter-gatherer and early farmers. And what happened is that by 6,000 years ago, none of these four groups disappeared. They all mixed with each other. They expanded and mixed with each other. And that mixing has caused a homogenization of these groups, just like when you mix ingredients to bake a cake. And there was a three-fold reduction in differentiation by 6,000 years ago. And by 4,000 years ago, at the beginning or middle of the Bronze Age, the population had reached its present-day low level of differentiation. So white people are, in fact, a recent phenomenon. They're not an age-old thing that existed for a long time. They're a product of profound mixture of multiple groups as different from each other as Europeans and East Asians that came together due to processes in this part of the world in the last 10,000 years. Third lesson in humility, which was that that event that explains why Luca Cavalli-Sforza was wrong about his assumption that the main gradient of variation in Europe that he was measuring reflects the movement of farmers into Europe. So prior to three years ago, there was an assumption that farming was the only economic transformation of the last 10,000 years in Europe large enough to make a substantial demographic dent on it. The idea was that prior to farming, there were hunter-gatherers in Europe. They were not exploiting the environment very efficiently compared to farmers. And so when farming was invented in the Near East, as it was 11,000 to 12,000 years ago, those farmers would be able to move into hunter-gatherer territory with their new technology, and bring people into Europe, and displace or mix with the local hunter-gatherers, and that would have achieved a substantial population transformation and turnover. But once there was a densely settled farming population in Europe, it would be very difficult to make a dent in Europe. For the same reason, the Mughals and the British politically controlled India for many hundreds of years, but neither of them has made much of a dent demographically on India. However, that turns out to be wrong for Europe. And the data shows that, and I'm going to show you the evidence for that. So with ancient DNA in 2014 and 2015, almost all the DNA available at that time was older than 5,000 years ago. And if you estimated what proportion of ancestry, what proportion of ancestors, people 5,000 years ago had from the farmers of Anatolia, which is the source population of farming-- the source population that brought farming technology into Europe, as we now know from ancient DNA correlation to the actual remains at archaeologically sites-- so it's shown in blue here-- and hunter-gatherers, people were mixed of these two sources, mostly farmer ancestry, but some hunter-gatherer ancestry with variable proportions in different individuals. That was the state of knowledge in 2014 and 2015. However, today, if you look at the ancestry of people in Europe, there is a third ancestry, which, in many groups, is the predominant ancestry, especially in Northern Europe, this red ancestry, which, for example in Northeast Europe, is more than half the ancestry. So when did this third ancestry arrive? It must have been sometime between $5,000 years ago and today. And so we and others sought to try to figure that out. So in the two or three years before this, we had had an important observation in our laboratory based only on analysis of modern individualism, and it went as follows. So we developed a statistical test for whether a population is mixed when you analyzed genome-wide data, and the test works as follows. We analyzed hundreds of thousands of places in the genome where people are variables, where some people, for example, have an adenine, an A, a DNA letter, and some people have a cytosine. And so we look at the frequency of those variable positions, here it's shown in a pie chart, in different populations-- for example, in Northern European population, in Native American population, and in Sardinian population, which is a Southern European isolated island population. And in this example, the Northern European frequency of the adenine, the A, or the thymine might be intermediate between the Native Americans and Sardinians. So what we do is we averaged the frequency difference over all 600,000 positions we're analyzing, and we ask the question, on average, are Northern Europeans intermediate in frequency between two comparison populations? And when they are, on average, intermediate frequency, it's provable that Northern Europeans are mixed between two groups related, maybe very distantly in the past, to the two comparison populations. So we have this statistical test in hand, and we applied it to all sorts of populations. So for example, for Northern Europeans, we took and applied it to all possible pairs of other populations. We get a huge signal in Northern Europeans, which is maximized when one of the populations is Sardinians. And the reason we think it's maximizing Sardinians, we now know, is because Sardinians are a relatively isolated population descended from the first farmers of Europe, who have not been affected by much subsequent movement. So it's a mixture of farmers and something else. And the second population of all people was Native Americans. And that was a real shock to us. It was definitely Native Americans. It was not South Asians or Siberians. It was not East Asians. And we, of course, didn't think that Native Americans crossed the Atlantic and moved into the Americas. Instead, we proposed a new model, a hypothesis, which was that Northern Europeans today are a mixture of ancient farmers from the Near East and a group that we called Ancient North Eurasian, which no longer exists in the world. It no longer exists in unmixed form, but was somewhere in Northern Eurasia before 15,000 years ago. Sometime before 15,000 years ago, descendants of it crossed the Bering land bridge into the Americas and contributed to the ancestry of Native Americans. And sometime after 5,000 years ago, presumably, they also contributed some ancestry to Europeans. So we proposed this population. It wasn't sampled in any modern data, but it was a statistical reconstruction, what we call a ghost population, something that is statistically a figment of our imagination, but we predict exists. So a year and a half later, a group working in Denmark, the same group that produced that important paper I showed you earlier, obtained DNA from this ghost population. They found it, and it was in this East Central Siberian sample, a 24,000-year-old little boy buried near the shores of Lake Baikal in Siberia, and this shows his affinities genetically. He has a strong affinity to Native Americans, shown in this heat map, and also affinity to Europeans, but relatively little relatedness to people who live in that same region today, to indigenous people, because present-day indigenous people are largely post-ice age migrants from the south back into that region. So this was very exciting. And since the discovery of this ghost population in ancient DNA, there have been discoveries of many additional ghost populations. So now, I'm going to show you a still movie that reveals how the ancestry of these ancient North Eurasians, which is the third ancestral population of Europe, got in to Northern Europeans today. So here what I'm showing you is something called a "principal component analysis." So this data is data from about 1,000 present-day West Eurasians drawn from the locations here. That's Spain. That's Italy. That's the Black Sea-- to just give you orientation, and the symbols show you where each sample are from. So what you actually would do in this analysis is you should think of your data as follows. I'm going to tell you how this analysis is done. So the data consist of a grid, a table, which has about 600,000 rows corresponding to all the positions we analyzed-- the variable positions. And it's about 1,000 columns corresponding to all the individuals we analyzed. So it's 1,000 by 600,000 table. And in each cell of the table is a 0, 1, or 2 corresponding to whether you have zero, one, or two copies of a variable DNA letter, like a thymine-- adenine or thymine, so this person at this cell in their table will have zero, one, or two copies. So then, you multiply this table by itself, and you'll get 1,000 by 1,000 matrix table measuring how closely related every pair of individuals is to each other averaged over these 600,000 positions. So with this table, which shows how closely related each of these 1,000 individuals are to each other, we can then carry out principle component analysis on the data, which is a mathematical technique for finding the combination of DNA letters that most efficiently separates the samples from each other. So a person's position on the X-axis here might be 0.1 times the value of DNA letter 1 plus 0.1 times the value of DNA letter 2 minus 0.1 times the value of DNA letter 3, and so on. And so you can look at the position of every sample. And the second y-axis is the second most informative way to separate the samples. And when you analyze the data this way, something magical happens, which is, West Eurasians break into two parallel gradients. On one side is the Near East. On the other side is Europe. On the top is non-Mediterranean populations. And on the bottom is Mediterranean populations. There's a pretty big gap in between filled by groups with known recent contact or plausible recent contact between Europe and the Mediterranean, like Jewish populations or island Mediterranean populations. Right, so this is the present-day samples. And now, I'm going to gray out these points. So these are the present-day samples, and I'm going to plot where the ancient samples fall. We know where they fall because we can use that 0.1 times the value of position 2-- 1 minus 0.12 times the value of position 2, et cetera, to just see where they fall. So if you look at the hunter-gatherers of Europe, people who live 8,000 years ago, you see they fall beyond Europe in the direction of European differentiation from the Near East, and that's because Europeans today are a mixture of hunter-gatherers and Near Easterners. But these people no longer exist in unmixed form in Europe because they mixed with the people from the Near East who came in with farming. Then, the first farmers come, piling up on top of Sardinian. Because Sardinians, you can think of as a relatively unmixed descendant of those first farmers, with relatively little influx since that time. But this is where most Europeans are today. And at that time, 8,000 years ago to 5,000 years ago, you still don't see people whose ancestry looks like Europeans today. Meanwhile, in Far Eastern Europe, you see a group called the Yamnaya, who are a archeologically well-documented group that I'll tell you a little bit more about in a, minute who are pastoralists. They're the first people who went out into the open steppe lands far away from the rivers to graze their cattle, and sheep, and goats. But you still don't see people like Europeans today. That only happens after 5,000 years ago very suddenly in association with some very well-documented archaeological cultures who made certain types of pots. And after 5,000 years ago, you see people with ancestry like Europeans today. So that's when this third ancestry, which is coming in through these Yamnaya, got into Europe, who, in turn, got it from the ancient North Eurasians deeper in time. So a summary is that Europe has been massively transformed by two mass migrations in the last 10,000 years. The first, after 9,000 years ago, bringing farmers from Anatolia from present-day Turkey into Europe, and this is the proportions of ancestry from first farmers and hunter-gatherers and a variety of ancient sample. And the second is the mass migration from the Steppe north of the Black and Caspian Sea. And the Yamnaya, these ancient culture and the many samples we now have for them, are an excellent surrogate statistically for this source, and they bring in this ancestry initially. It's a 70% population replacement in some parts of Europe. And then, over time-- this is moving forward in time, there's a rebound where farmer populations mix back in, and you get populations which are primarily Yamnaya in many cases, but still have major other portions of their ancestry. So who are these Yamnaya people? So as I mentioned, this is a skeleton of one of the people we sequenced. This is a big, scary copper mace that this individual had. His head was bashed in. He was basically killed in a violent incident. And this is common for many of the Yamnaya individuals. So the Yamyana, as I said, were a very impressive and unique archaeological culture. They were the first people to take advantage of two recent inventions. They first spread over the steppe after about 5,300 years ago. And they took advantage of two recent inventions. One was the recently domesticated horse, which was a profoundly important innovation, and the other was the newly invented wheel. They used horses hitched to wagons to bring their supplies out into the open steppe lands far away from the river valleys, including water, and to exploit the rich grasslands that were not exploited before. Prior to the Yamnaya, there were many small different cultures spread over this region, which had little settlements after the Yamnaya, very few of those settlements persisted, and all you see left are big graves. And the interpretation of many archaeologists is that these people were living in ancient versions of mobile homes and moving around the steppe. They were very successful. They expanded from where they initially originated, all the way from Hungary in Central Europe, all the way to the Altai Mountains on the boundary of Mongolia. And many of the groups that had been there before disappeared. So I'm going to now show you what happened with the spread of Yamnaya ancestry in different places in the periphery. So here's Britain. So from 6,000 years ago when farming first got to Britain from the continent to 4,500 years ago, here is the proportion of farmer ancestry on the Y-axis. And then, bang! 4,400 years ago, this people from ancestry from the East get into Britain. It's a 90% population replacement. Stonehenge is built a little bit before 4,500 years ago by people who, presumably based on this plot, had entirely farmer ancestry. The big stones had just gone up. And the people in Britain today who primarily descend from people like this with 90% ancestry from the east are not the people who built these monuments. This is what happened in Iberia. So a very similar pattern, where 6,000 years ago, there's farmer ancestry up till about 4,500 years ago. Then, in Iberia, we can document a period of overlap for a few hundred years, where there are farmers and also people with ancestry from the east living side by side. And then, after a few hundred years, they mixed together and achieved an intermediate proportion. In each case, it's only about 40% ancestry from the east, a less dramatic replacement of population than you see in Britain. However, if you look at these circles and the coloring of the circles, you'll see something very important. So the open circles are females, and the field circles are males. So for males, you can determine the Y chromosome sequence, which is the sequence you get from your father. And if you're a male-- and you can determine whether it's typical of the Russian steppe or not. And if it is, it's red. And what you see in Iberia is it's 100% male population replacement despite being only a 40% overall DNA replacement. And what that's telling you is that these male individuals coming in from the east had preferential access to local females again and again, displacing local males generation after generation. And it's telling you something very profound about the nature of this population replacement that happened through people who descended from these Yamnaya. We also now know something profound about how Yamnaya ancestry spread east. So here's a paper that we have that we're trying to bring to publication now where we report data from more than 500 individuals from these squared points. Here's Kazakhstan. Here's Iran. Here's Pakistan. And we compared it to many, many modern populations from South Asia to learn how Yamnaya ancestry spreads east. So with this huge sample size of ancient DNA, which makes it possible to carry out population studies, we can do something that's very difficult to do with small sample size, which is we can look at variability. So for example, if you're looking at a population and many samples from the city of Tokyo for example, if you only had three samples, it perhaps wouldn't look very interesting. They would all perhaps look genetically the same. But if you have 80 samples or something like that, you're going to see occasional outliers, individuals who have very different ancestry. They might be European, or they might be Chinese, or they might be Korean. And they're telling you about the groups with which people in that city are in cultural contact. That's what we're seeing in towns like this that were spread to the north of present-day India and Pakistan 4,000 years ago. Here's one of these towns which is one of the first great civilizations of the ancient world called the Bactria-Margiana Archaeological Complex. And we have 80 individuals from this town, and we can study the outliers at these sites. So most of them are genetically similar to contemporary farmers from ancient Iran that we also have data from. However, prior to 4,000 years ago, we see occasional outliers, which are similar to hunter-gatherers from that region. After 4,000 years ago, they have ancestry from the Steppe, ultimately descended from these Yamnaya Steppe pastoralists. And then, in individuals from Pakistan from 3,000 years ago, from Northern Pakistan, the Swat Valley, we see through these chunks of DNA that have fragmented that the Steppe ancestry has been there for at least 500 years. So we now can limit when this Steppe ancestry got injected into South Asia to a relatively narrow window between 4,000 to 3,500 years ago. And it explains, today, Steppe ancestry ranges between 0% to 30% of the ancestry in South Asia today, and it somehow got pushed through in this time. We also find amongst these sites of this ancient Bactria-Margiana complex and other neighboring sites in Eastern Iran, 14 individuals who have a different type of ancestry, not from hunter-gatherers from the North, not from Yamnaya, but rather South Asian related admixture related to present-day Southern Indians or Southeast Asians, people more similar to Southern people in India today. And we interpret these individuals as migrants from the South and Southeast, and probably from the Indus Valley Civilisation, which was a civilization further to the south contemporary to the Bactria-Margiana Archeological Complex. So what we can see in South Asia is a history of three layers of population mixture. Prior to 4,000 years to today in South Asia, people are a mixture of a Steppe-ancestry related source, an Iranian farmer-related surface, and a Southeast Asian-related source. And people in India are a mixture of two mixed populations, which in a paper in 2009 we called the "ancestral North Indians and the ancestral South Indians," and these are individual populations. Now, how did this gradient form? Well, more than 4,000 years ago, these samples who are outliers at these towns are mixtures with no Steppe ancestry, but from an Iranian-related source and a Southeast Asian-related source, and we see variable proportions of it. The Ancestral South Indians are just a point along that gradient that we think remained from that time. After 4,000 years ago, this Steppe-pastoralists group mixes in, and we have multiple samples along that gradient, and the ancestral North Indians are a point along that gradient. And mixtures of those mixed populations after 3,000 years ago or so form South Asian ancestry today. So three gradients. Looking about it geographically, here's another way to see it. So farming was developed in the Near East 11,000 to 12,000 years ago and explodes both West and East, after 9,000 years ago. Into the west into Europe from Anatolia, into the East into the Indus Valley from Iran after 9,000 years ago, it spreads across these two subcontinent of Eurasia, which are about equal in size and, typically, historically have had about equal populations to each other over several thousand years. It takes a few thousand years for farming to spread because the farmers need to adapt their crops to the new ecological conditions, different temperatures, different rainfall patterns in each place. And so meanwhile, the Steppe-- and the Steppe shown here in yellow, the Yamnaya form. They spread to the peripheries of each of these regions, and then mixtures of these mixed populations of the Steppe people, who then, the farmers they mixed with in each place, forms the primary gradient in each region today. So people across this region from Europe and also India and Iran speak very related languages, Indo-European languages, which are almost homogeneously, with some important exceptions, spoken in Europe and spoken in Northern India, and Iran, and also in a few other places. And it's now highly likely that these spoken Indo-European languages are reflecting the spread of Yamnaya ancestry across into Europe and into Central Asia and through successor cultures further on into each of these regions, corresponding to the times documented by the ancient DNA. So the final part of my talk is this work today, which was published today about ancestry in the Americas. So in 2012, we published a paper which analyzed data from diverse present-day Native Americans. And here's a lot of present-day populations we analyzed. And what we showed, and what subsequent work has also shown much more richly, is that people in the Americas can be-- the deepest separation amongst Native Americans today is one that's given rise to some Northeast Native Americans, like Algonquin-speaking groups from Northeastern United States and Canada, and Southern Native Americans, who's everybody in Central and South America. And it might have been a simple radiation from that ancestral population leading to the populations today. So in 2015, we did some more analysis of present-day data because we didn't have any ancient DNA data yet. And Pontus Scotland, a postdoc that I work closely with, did an analysis where we played the same game and looked at whether New Guineans, Australians, and other people from Southeast Asia are more closely related to Native Americans from the Amazon and Native Americans from Mexico. So you might think this is a weird analysis, and it is a weird analysis. However, there's a very strong signal of Native Americans from Amazonia being more closely related to New Guineans, Australians, and other Southeast Asian indigenous groups than Mexicans. And so here is the degree of sharing. And you have a group of samples from Amazonia having this sharing with these Australasia groups. So what we argued is that this is evidence for two founding populations of the Americas, that somehow there was a group that contributed more to these groups in Amazonia than to other Native Americans, and that it was not a single-source population, but a mixture of at least two populations coming early on into the Americas, and that the primary ancestry of Native Americans is from a group that's less closely related to Australasians. So in this paper in 2015, actually, this is our model that Amazonians and Native Americans have a different sources slightly related to Eurasia. So today, we and several other groups published new DNA from the Americas, an ancient DNA. And so to give you a sense of how this is affecting the literature, the amount of ancient DNA from the Americas was really only eight high-quality samples until 2017. But just this year, there's now a big jump of almost 100 new individuals, most of which are from today. In South America, it's even more dramatic. There was only one individual until today. And now, there's 51 individuals. And so with this tool, we can look, and ask, and answer questions about the past that were simply not possible to answer before. There were three papers about this today. I'll tell you about ours, which I know best. So in our paper, we analyzed data from 49 newly reported Central and South American individuals, which correspond to the squares here, from four regions of the Americas-- Belize, the Central Andes, Brazil-- that's not Brazil; that's the Southern Cone-- and each of these gradients starts, at least, 9,000 and up to 11,000 years ago and goes until, in some of the cases, until the last 1,000 years. So when we analyze a heat map measuring how these very ancient samples like this 11,000-year old individual from Chile relates to present-day Native Americans, you see they're not particularly closely related, this individual in Chile in green, triangle, to people from that same region today. So that individual doesn't have any obvious evidence of particular relatedness to people who live in the same region today. Similarly, this 10,000-year old individual from Brazil doesn't have any obvious relatedness to people in Brazil today. Similarly, this 7,500-year old individual from Belize doesn't have any obvious relatedness more to Belizeans than to other Central and South Americans. However, between 9,000 to 6,000 years ago, that relatedness developed. So here's a 7,700-years individual from Argentina, and there's clear relatedness to present-day Southern Cone populations. Here's a 6,000-year old individual from Brazil, which begins to show relatedness to present-day Brazilians. Here's an almost 9,000-year old individual from Peru, which is clearly more closely related to Peruvians today than after that time-- than before that time. So what that shows you is that beginning around these times, there began to be established, in each of these regions, groups that were clearly contributing to people later today. So we developed a very simple model for how these ancient samples diverged from a common ancestral population, and it a sample splitting model, with stem Native Americans, ancient Alaskans, ancient northern Native Americans, I told you before, southern Native Americans, and then a rapid radiation of lineages in Central South America. So quickly, it was as if they were moving into an empty continent. However, when we compared them to this individual, which had been published a few years ago by another group, 13,000-year-old individual from Montana, from what's known as the Clovis culture, we found that our three earliest samples from Belize, from Chile, and from Brazil were distinctly more closely related to it than the later samples. And so that told us that these groups had ancestry from a group related to that that was then largely displaced. So that group is associated with the Clovis culture, which is the first widely dispersed culture of North America and dates to about a little bit before 13,000 years ago and made these very distinctive points. And there's always been a question of, was the spread of Clovis, an event that also impacted Southern America? Of course the culture didn't move into that region, but maybe spreads of people that occurred around the same time and maybe were related to each other might have impacted some contemporary sites associated, for example, with fishtail culture in South America. So our data answered that in showing that spreads of people associate with the spread of the Clovis culture, at least that one sample, also impacted South and Central America. So with this data in hand, we can actually come up with a more complicated, and rich, and informative model of what happened. And I didn't have time, because it was just today, to make us a pretty slide, so this is a little busy. So here is this model with the Clovis individual and how it's related. And what you can see with these dotted lines is the mixture events. So blue shows the individuals who have ancestry from it, our three earliest individual. And the earliest samples in Chile, Brazil, and Belize have specific relatedness to this individual, suggesting that a spread of people that propelled the Clovis expansion also had a larger impact across the Americas. So after that, beginning around 9,000 years ago, that affinity to Clovis genome disappears. And it implies that a previously unknown large-scale population turnover that nobody had predicted occurred and affected Chile and affected parts of Brazil, and maybe other places besides. And so what that's telling you is a really important event that it would be very interesting to see what, archaeologically, it might correlate with. We also have data from previously published work from the California Channel Islands, and they allow us to show that a another movement between South and North America also spread widely over the central Andes about 4,000 years ago. Oops. And finally, we don't detect population Y ancestry related to these Amazonians, but another paper published today seems to detect that. So that's a fourth source of ancestry, probably. It's really interesting to wonder which-- this data answers some questions, such as, did the spread of Clovis also have an impact in terms of people on Southern America, but it raises more questions than it answers. What were the archaeological events associated with these major population turnovers? What happened? And which, if any, were the Americans who we know documented from sites before Clovis 14,500 years ago or more, for example, in Chile? So I think what's going to happen in the coming years is we're going to obtain DNA from more and more people. So this is the DNA our laboratory has generated and published so far. I told you about the new work in India. But what you notice here is that it's very Eurocentric. Almost all the samples are from a corner of the world, which is not the most important or the only corner of the world, but what we've published is similar to mostly to what other groups have published in terms of its distribution. We're trying to rectify that. So here's one of the papers I told you about. Here is the paper we published today. And I think that we're obtaining more and more data over time, with, essentially, a doubling every year in terms of the amount of data that we're generating. And I know this is happening in other laboratories, too, so it's making it possible to imagine building an atlas of human transformations all over the world in this period. So in summary, ancient DNA is teaching us that much of what we thought we knew is wrong because every time we use this amazing technology to look at the past, just like a microscope, it shows us profound things we didn't know about before; that we're all mixed. No one is pure. And I think anybody who claims that and pays any attention to this data can no longer think that purity is something that's possible or ever existed. And it's an unusual field where scratching the surface is guaranteed to surprise. I was asked many times to write a book five or six years and seven and eight years ago about this work, and I'm sure many other people in my field were. And we weren't doing so because, in our field, the currency of scientific papers-- and we don't write books-- but as my colleagues came to include more and more archaeologists, and linguists, and interested people, I felt that those people needed a book in order to understand what's going on, so I wrote this book that tries to explain in a serious way but a comprehensible, jargon-free way what's going on because I think it's very important to understand this work. It has impacts on lots of other areas. Thank you very much. [APPLAUSE]
Info
Channel: Harvard Museum of Natural History
Views: 189,157
Rating: 4.7078261 out of 5
Keywords: evolution
Id: 990052wQywM
Channel Id: undefined
Length: 66min 18sec (3978 seconds)
Published: Mon Dec 03 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.