Thank you all for coming out tonight and
giving me the opportunity to talk to a somewhat different group of people from
the people I normally talk to our scientific results of talk to about our
scientific results so I wanted to start by as Bruce mentioned in his
introduction by talking a little bit about the New York Times those of you
who read the New York Times science page or the Washington Post or The Guardian
or look at National Geographic or any of these other publications will have seen
headlines like this it's a very nice article that came out about a year ago
by one of the best science writers in the area of evolution Carl Zimmer about
reconstructing the Tree of Life and if you were to click through and look at
that article you would see an image that looks like this showing the evolutionary
relationships among 3,000 different species sampled from across the globe
and if you look closely at this picture you'll see that most of this tree is
made up of single-celled organisms bacteria and primitive single celled
organisms called archaea and the organisms that we know best
including animals, plants and fungi are all down in this little tiny corner here
and the animals that we identify with the most such as vertebrates and mammals
are so small that they don't even get a label on this graph so some of these
pictures are quite remarkable you can see from this picture that everything
traces back to a single universal ancestor of all living things that would
have lived about 3.5 billion years ago as best we can tell from the fossil
record and from genetic analysis if you continue to look through these sorts of
popular publications you see a number of different articles about evolution
dinosaur evolution, evolution of influenza,
fruit fly evolution and the way that natural selection influences fruit flies,
evolution of lice, co-evolution in mammals and dinosaurs and of course many
articles about human evolution the evolution of mountain populations and
their adaptations to high altitude the population of Australia and of the
Americas and then of course many articles about one of my favorite topics
Neanderthals and one of these articles actually describes the paper that we
published a little more than a year ago and I'm going to tell you about that
today now of course if you're less selective about your sources you might
encounter some articles like this one which happens also to be about our paper
I'm sorry to say this is an actual news report in The Huffington Post talk about
a corseting of public discourse it's rather embarrassing and this is even
before the Trump era all right but what I want to talk about today is how do we
actually know this stuff these findings are obviously astonishing these stories
about human evolution, about dinosaurs, about the tree of life but how can we
figure this sort of thing out from modern day evidence the short answer I'm
going to give you is a term called molecular phylogenetics
where phylogenetics comes from the term phylogeny introduced by Ernst Haeckel
in the mid 19th century a German biologist essentially means genesis and
evolution of a phylum or a branch of life and molecular refers to the
analysis of DNA sequences for the most part these days but also protein
sequences the sequences of amino acids that make up proteins RNA sequences and
other biomolecules and so in this talk tonight I'm going to try to give you a
sense for what this field is how it developed over the last 50 years or so
and then towards the end of the talk what it can tell us about our own
ancestry and our relationship to neanderthals so like any good academic I'll
start by establishing my credibility so I've been studying molecular
phylogenetics for a long time I got very interested in this topic right after
college in the early 1990s when I was working at Los Alamos National Labs
studying HIV and I discovered this whole world of making use of computers to
reconstruct the past and became fascinated by it over time we've
published of papers describing the evolutionary relationships among
cultivated plants we've described processes by which bacteria transfer DNA
from one strain to another this is a process called horizontals transfer
we've studied complex families of genes that evolved by duplication and
loss as well as through speciation we've studied small RNAs in fruit flies and
many many other topics but there's a collection of core techniques that we've
used again and again throughout these sorts of analyses and they rely on
modeling the evolution of DNA sequences along the branches of an evolutionary
tree or a phylogeny and so I'm going to try to tell you a little bit about how
this area came about and how it works so no talk about phylogenetics would be
complete without this slide how many of you seen this picture alright quite a
few maybe 20% or so there's a bit of famous image it's claimed to be the
first phylogeny this is actually drawn by Charles Darwin in his famous notebook
B about 1837 so that's 20 years before the Origin of Species was even published
so he was doodling in his notebooks pictures of evolutionary trees and he
realized as soon as he started to think about these processes by which one
species would evolve from another that this would give rise to a branching
structure where more primitive organisms were at the base of the tree
and as you got closer to the tips you approached the present-day and you would
you would have a series of branching operations that would lead to a cut of a
family tree among all living species today Darwin was still quite
taken with this idea by the time he published on the Origin of Species and
this is actually the single figure in in that book if you flip to the very back it's
not even numbered because there's only one figure in the book it's mostly text
you go to the very back you'll see this image here the single figure in the
Origin of Species and at times Darwin spoke rather poetically about this image
of the tree he talked about how limbs divided into great branches were
themselves one when the tree was small budding twigs and so on and so forth he
wasn't the first one to think in terms of trees you can see the image of the
tree you can see precursors of this idea of the tree and the work of Linnaeus and
Lamarque and others but Darwin was the first one to sort of unify this idea of
evolution with a tree and and realized that it would imply that all life on
Earth was related by a single tree so for many years biologists tried to build
trees but not having DNA sequences they had to work with observable traits and
this was was an area that became known as cladistics biologists would identify
particular characters phenotypic characters, morphological characters in
organisms and try to come up with branching relationships that would that
would only require those characteristics to emerge once so for example they
imagined that a vertebral column would emerge once and that would separate a
lamprey from a land slit and then jaws would emerge once and they would
separate a tuna from a lamprey and so on and so forth and in that way they were
able to get a pretty good idea of what the Tree of Life might look like but of
course there were many difficult to resolve evolutionary questions parts of
the tree that were difficult to work out because there weren't good
characteristics physical characteristics that separated one group of organisms
from another so the the key development that people tend to point back to in
the emergence of molecular phylogenetics is an observation by Linus
Pauling and Ear- sorry Emil Zuckerkandl
in the early 60s who were studying the hemoglobin protein and they were looking
at hemoglobin proteins that they had sequence from various different species
and they knew something about the evolutionary relationships about these
species and about how long ago they must have diverged based on the fossil record
and they noticed that the numbers of differences in these amino acid
sequences were were roughly proportional to the estimated evolutionary time since
these species have diverged so they introduced this idea of a molecular
clock of a clock that's ticking over time laying down new mutations on on
these amino acid sequences and those sequences those mutations are
accumulating over time so that things that are more distantly related have
more mutations between them and things that are more closely related have fewer
mutations between them so their idea looks something like this you would have
a gene in some ancestral species that species would split through some sort of
speciation event into two daughter species maybe one group of organisms in
the population would migrate to the other side of the river and stop
interacting with the other subset of that species and over time they would
diverged from one another into two subspecies and then those subspecies would
begin to accumulate mutations separately and so now if you were to compare a
protein from one of them with a protein of another you might see that there were
two mutations unique to this one and two mutations unique to this one but now as
time goes on more mutations would be accumulated and perhaps additional
speciation events would occur and you'd have more and more differences
accumulating between the proteins that were present in these individual species
so now if you looked at the protein from species B and species C they would only
differ at a few places but the proteins for
B and C would differ from the protein from species A in many more locations
and in this way you could start to imagine reconstructing an evolutionary
tree by counting up the numbers of differences in these proteins that was
the core idea introduced by Zuckerkandl and Pauling so if you look at modern-day
data for proteins for a particular protein in this case the cytochrome C
protein from a number of different organisms in this case we're focusing on
a number of mammals and you plot the estimated number of years since those
two species diverged from a common ancestor as estimated from the the
fossil record against the number of substitutions in this case they're going
to be DNA substitutions rather than amino acid substitutions but the
principle is the same you would see over time an approximately linear
relationship between those two properties and in this way these
mutations can be thought of as a kind of a clock that we can use to date the time
since things diverged and also to reconstruct the shapes of the
evolutionary trees that describe their relationships so Zuckerkandl and
Pauling observations were really just sort of empirical they just noticed this
property of proteins they didn't really give a recipe for how to reconstruct the
phylogeny from this sort of data but a few years later this became a very
active area of research and one of the pioneers in this area was the Italian
human geneticist Luca Cavalli-Sforza who collaborated closely with a British
statistician Anthony Edwards and they came up with the first recipes for using
this sort of data to reconstruct a tree that would show how closely different
organisms were related and how long ago they might have diverged so over the
next 10 years or so there were a large number of different types of techniques
proposed for reconstructing these trees I want to
show you what one of them looks like this turns out to be one of the most
intuitive and easy to understand and also one of the most powerful techniques
for reconstructing evolutionary relationships it's called parsimony
because it tries to find an evolutionary history that minimizes the number of
changes required to explain the observed data and I'll show you what I mean by
that as we go forward imagine we have three species one, two and three and for
simplicity let's imagine we know their ancestral sequence maybe we can infer it
by looking at a distant distantly related relative and we want to find
what the evolutionary relationship is among species one, two and three let's
focus first on just one variable position in those sequences this is
known as a site in the literature so at this particular position species one and
two have a C in species three has an A and now we're going to consider that
there are three possible evolutionary relationships among those three species
either one and two could be most closely related with three as an out-group or
one and three could be most closely related with two is an out-group
or two and three could be most closely related with one as an out-group those
are the only possible relationships among three species and now let's try to
imagine the minimal sequence of mutations that could explain the
observed data if we assume that there's an A at the root of the tree well if
there's an A at the root of the tree then we can explain this data under the
first tree by just one mutation from an A to a C along this branch leading to
species one and two does everybody see that there were a mutation there it
would lead to a shared C in species one and two while species three would still
have an A if we try to explain the same data using species two or three we can
only do it with a minimum of two mutations requires two mutations to
explain this pattern under species under under tree two and it requires two under
tree three now that doesn't necessarily means tree two or tree three
wrong there are going to be cases where there are multiple mutations that happen
at a site but if we systematically see across all positions in the data that
one tree is supported more than the others that gives us a strong belief
that that must be the true evolutionary relationship among those species and in
this case if we look at the other sites sites 2, 3 and 4 and similarly try to
match them up with the tree I'm not going to go through all the details we
do that we find that actually none of them none of these sites strongly
supports one tree over the other they all require five mutations across these
three sites to explain but if we had a large number of sites we could add them
up and we could say what's the total number of events required under each of
these trees in order to explain the observed data in this case we get six
events under tree one, seven events under tree two and seven events under tree
three so that gives us some confidence that tree one is most consistent with
the data now in this case maybe not a whole lot of confidence maybe this is
not the greatest example but in real examples we would hope we would look at
hundreds or thousands of sites and see many many dozens or many hundreds of
cases where you prefer one species or one tree over the other okay and in
practice that's what people do when they analyze these data so here's an example
this is the famous example so one of those cases that I mentioned where
morphological characters were difficult to resolve a question of evolutionary
relationships is the case of the great apes so in particular the question
of whether humans Homo sapiens are more closely related to chimpanzees Pan
troglodytes or gorillas gorilla gorilla and this was a problem that plagued
taxonomists for many years because there are many derived traits among all
three of those organisms and it wasn't clear which two were more closely
related than the other so by the late 1980s Goodman and his group had obtained
quite a lot of sequence data for the time
tens of thousands of DNA nucleotides from each of these species this was in
the area of the beta globin gene and they used those to perform this sort of
parsimony analysis that I just told you about and what they found was that they
the best tree was the one that I'm showing here that groups humans and
chimps with gorillas as an out-group and that tree required 383 different
substitution events nucleotide mutations and they map those to the branches of
the tree and are quite a few not a huge number but a significant number that
support that grouping of humans and chimps so these are mutations that are
shared by humans and chimps and not shared by the other great apes right and
this gave them quite a lot of confidence that this was the true evolutionary
relationship among these species another one of my favorite examples also sort of
a classic in the phylogenetic literature has to do with the cetacean
whales, dolphins and porpoises so as many of you know whales are mammals but it's
not obvious how they relate to other mammals because they're morphologically
so distinct they're so highly diverged from other mammals so this was a problem
that plagued taxonomists for many years as well and a number of papers in the
late 90s most notably this one by a Japanese group in 1999 obtained sequence
data from toothed whales and baleen whales along with many other mammals and
they showed very clearly that the closest relatives of mammals were
hippopotamuses so this was quite striking the the whales, dolphins and
porpoises trace their ancestry to an ancestor of hippopotamuses about 50
million years ago and this is something that is now fairly well supported by the
fossil record as well it appears that this this evolutionary divergence
happened in and on the Indian subcontinent actually started probably
with a terrestrial mammal and and some time later they made their
way into the ocean so the fact that hippopotamuses are aquatic as well is
an example of convergent evolution they it's believed from the fossil record
that their ancestors were terrestrial okay another great figure from the early days
of molecular phylogenetics was a guy named Allan Wilson and I want to focus on
Wilson in particular here because he was especially interested in the evolution
of humans and of the great apes and he was a very prolific author throughout
the 1960s 70s and 80s and one of the pioneers in obtaining sequence data from
humans and other apes and finding out the relationships among those
individuals he also is important in that he trained a number of important
people in the field including Svante Pääbo who's the person I'm going to tell
you about a little bit later one of the pioneers in Neanderthal DNA
sequencing Allan Wilson also trained of Mary-Claire King who's the discoverer of
the BRACA1 breast cancer gene which some of you might have heard of so he's
a very influential person in genetics and evolution during this period
actually if you look closely you can see in this picture here he's drawing
molecular clock pictures this is cytochrome C, there's hemoglobin and
there are a few others he's joined drawing these pictures like the one I
just showed you about how as time goes on proteins diverged in a roughly linear
fashion so Wilson and his colleagues Rebecca Caan and Mark Stoneking
published a very important paper also in the late 80s this time in Nature this
was really the first large-scale study of human evolution based on
mitochondrial DNA so they collected 147 samples from 147 different people from
around the world sequence their mitochondrial DNA and then built a big
evolutionary tree using parsimony like the ones I just told you about
describing how those individuals were related you can't see what I'm showing
you there so I'm going to zoom in a little bit on a subset of these
individuals what they found was that they looked at multiple populations from
around the world Africans, Asians, Australians,
Europeans what they found was that most of these
non African groups such as the Europeans formed of clades they called clusters on
the trees that they were able to reconstruct but the Africans almost
invariably fell outside of the variation in these non African subgroups and
that's very very strongly suggested that Africa was the original source of human
genetic diversity and that these various groups had emerged out of Africa
sometime after the African diversity had already been established and this is
what supported now as I'll show you by many many subsequent studies in general
we see much greater evolutionary diver- diversity within Africa
than we see in these non-African populations and these typically
represent subsets of the genetic diversity that had been present in
Africa and then moved out possibly in multiple colonization's you can see in
their abstract they actually mentioned multiple origins for non-African
populations and we'll see later in my talk but that's something that has
persisted to the day and something that our work tends to support another
another piece of this study was they obtained an estimated date for the
divergence of all of these populations and they estimated at about 200,000
years ago that turns out to be a date that also holds up pretty well we'll
come back to that as the talk goes on so this led to the terminology
mitochondrial Eve some of you may have heard of this that the idea is that all
people on Earth can trace their maternal inheritance back to one woman who
lived in Africa about 200,000 years ago and she would be mitochondrial Eve so I
neglected to tell you some of you may know this but the but the mitochondrial
DNA is is inherited from your mother only from it's an maternally inherited
molecule whereas most of your DNA is inherited from both parents so when you
reconstruct the history of human populations using mitochondrial DNA
you're reconstructing only your maternal history so this these results referred
only to that all right so throughout the the 1990s
people continued to work hard on these phylogenetic methods for understanding
human populations and a particular pioneer in this area was this guy Luca
Cavalli-Sforza who I mentioned earlier as one of the pioneers of developing
phylogenetic methods by this time he was at Stanford and he carried out a very
ambitious research program traveling around the world obtaining samples from
people and studying them using phylogenetic methods including
mitochondrial DNA, Y-chromosomal DNA and DNA from the rest of the genome and he
also was a pioneer in comparing and contrasting his genetic findings with
what could be found through the study of linguistics and through the study of
cultures and so on and so forth and he wrote a very important book I
think came out in 1994 that really captured the state of the field at that
time I'm not going to go through these individual papers but I'm going to
instead give you a summary of about what was known about human evolution around
2000 actually taken from a review article by Cavalli-Sforza and his
colleague Mark Feldman from 2003 so at this time roughly 15 years ago it was
essentially established to their to their best guess using the data they had
available that anatomically modern humans had emerged probably in East
Africa although there were some that argued for South
Africa around 200,000 years ago and that by about a hundred thousand years ago
these groups had begun to split and spread out across the African continent
and give rise to the different African populations that we see today
for example, the Bantu of northern and western Africa and the Sān
of southern Africa and then by around sixty or seventy thousand years ago one
or more waves of migration occurred off of the African continent these early
humans began to populate the rest of the world through several different paths
there was at least one southern migration to the east at least one
northern migration to the east and at least one migration to the west there
quite early remains in Australia going back as long as 60,000 years ago and
there are remains in China of anatomically modern humans that also go
back 60,000 years so these are quite early of colonization's the evidence in
Europe was for us a slightly later colonisation about 40,000 years ago and
then of course the population of the new world was considerably later required
crossing the Bering Land Bridge probably 15 to 20 thousand years ago and again
this appears from subsequent work to have occurred in multiple waves rather
than in one wave of colonization all right so this was essentially what was
known at that time and then around 2008 or so this game really began to change
dramatically and it really changed because of DNA sequencing technologies
so so a new type of technology for obtaining DNA sequences very very
cheaply and in very high volumes began to emerge in the mid 2000s and it became
clear that we could start to obtain complete genome sequences from
individuals across the globe and the culmination of this effort was a project
called 1,000 Genomes Project which has now obtained very high-quality
complete genome sequences for several thousand humans from
multiple populations from from across the globe and as this became possible it
became clear that we no longer had to restrict ourselves in these sorts of
studies to mitochondrial DNA or Y-chromosomal DNA we could study
complete genome sequences for humans and use those to try to understand our
evolutionary history all right so that sounds good more data is usually
good but it turns out that in this case more data leads to some significant
complications and I'm going to try to give you a little bit of a sense for how
this problem becomes more difficult when you look across the entire genome rather
than looking say just at the mitochondrial genome or just at the Y
chromosome which are inherited as units Y chromosomes paternally and
mitochondrial DNA maternally okay so one issue is that we have two copies of
every chromosome so if you look at lot one of my genes see my hemoglobin gene I
have a copy that I inherited from my mother and a copy that I inherited from
my father and those copies have different evolutionary histories in the
same way that my mother and my father have different evolutionary histories so
if we look at a collection of individual chromosomes from modern-day individuals
we're going to count backwards in time so x 0 is the present day now we can
think of each individual as having two tips in that tree right so the blue
individual has a tip here and a tip there one is the maternal copy and one
is the paternal copy of the particular gene that we're looking at same for the
green individual and same for the purple individual we can then trace backwards
in time and build up a phylogeny all of those in- for all of those
individual chromosomes but it's no longer at the level of individuals it's
now at the level of chromosomes okay so that's one complication if I build an
evolutionary tree for a single gene in the genome I have to keep track of the
fact that each individual has two copies of that gene when it gets really
complicated is when we think about the problem of recombination so some of you
might remember from your high school biology class - it's okay if you don't - that
when you're when your cells go through a process called meiosis the process of
cell division that leads to sperm and egg cells that the paternal and the
maternal chromosomes swap genetic material with one another so if this is
the paternal and this is the maternal chromosomes they cross over and some
material from the maternal chromosome ends up on the paternal chromosome and
vice versa and that happens every generation on
every chromosome essentially what that means is that over time the different
genes on a chromosome will have different evolutionary histories if I
look at my hemoglobin gene it's going to have one evolutionary history a
different one for my mother and for my father but one evolutionary history for
each of those if I then go to my cytochrome-C gene because it's in a
different location on the genome and things have been shuffled by the process
of recombination it's going to have a different evolutionary history so at
every position along the genome I'll have a different tree describing the
relationships among the chromosomes at that position now this turns out to be
good and bad it's bad in that it makes things very complicated to study when I
try to reconstruct evolutionary trees from population samples of humans I have
to deal with this nasty problem of the tree changing as I go along the
chromosome but it's good in that I'm actually sampling a much larger portion
of my ancestry remember with the Y chromosome I'm only looking down one
lineage I'm looking at my father, my father's father, my father's father's
father and so on I'm only looking down one lineage of all my possible ancestors
similarly with mitochondrial genome in this case at every locus I'm sampling a
different set of ancestors because things have been
swapped around in different ways by this process of recombination so it
potentially gives me a lot more information about my ancestry a lot more
information about how long ago different populations might have diverged a lot
more information about gene flow between populations as we'll see in a moment and
more information about how large ancestral populations might have been so
let me talk a little bit about this issue of gene flow because that's where
I'm trying to take you with this whole study just like The Huffington Post said
so imagine that we have two completely genetically isolated populations let's
say they they live on separate islands and they don't have any technology for
getting between the islands and they diverged some number of generations ago
that will call tau now if I sample an individual from each of those
populations at a single locus and I trace them back then they're going to
find some common ancestor and that common ancestor will vary from one
position along the genome to the next because of historical recombination just
as I was telling you but if it's true that those two populations have been
completely isolated genetically then it has to be at least as old as tau right
when I find their common ancestry when I trace back to their common ancestry to
their common ancestor it has to be in this ancestral population before the two
were were isolated from one another however if there has been some gene flow
between those two populations if some some of these guys have been finding
rafts and sneaking over to these guys right then I'm going to I'm going to
have some places along the genome where their common ancestry is younger than
the split between the two populations all right so if I look across the genome
at many different locations and I see that most of the ancestry is old but
there's an occasional position along the genome with very recent ancestry that's
a telltale sign of gene flow between two populations right
and that is essentially the signal that we look for when we study these ancient
interbreeding events okay all right now I'm going to have to start to skip over
some details because the methods that we actually use get fairly complicated but
I want to tell you at a high conceptual level essentially what we're doing so my
group got interested in this problem about seven or eight years ago and we
were we wanted to model this problem of finding common ancestry along complete
genomes allowing for it to change for the patterns of ancestry to change from
one position in the genome to the next so we set it up in the following way we
collect DNA sequences for many locations across the genome we have a
representative one or more representatives of several populations
we propose some branching relationship among those populations we can try
several if we're not sure what it is but sometimes we have enough information
from the fossil record that we have a pretty good idea of what that
relationship is so for example if these were Europeans and West Africans then
these might be South Africans we know essentially from other studies about
their general relationship with one another and then using the computer we
explore many many population trees consistent with the data across the
genome and we adjust the parameters of this model the time since these
populations diverged and the amounts of gene flow between populations until they
best fit the data we do that by exploring millions of these possible
genealogies across tens of thousands of DNA sequences drawn from the genome and
we make use of techniques drawn from statistical physics called Monte Carlo
techniques that let us in a principled way explore this space of possible
genealogies and at the end of the day the computer gives us
model and it tells us which model best fits the data and how
much confidence we have in the individual parameters of that model all
right and some of these genealogies will will involve gene flow between
populations and others won't and we can turn a knob there's a parameter that
describes how much of that gene flow there is so we can test the possibility
of gene flow or the possibility of not having gene flow okay so the reason we
were particularly interested in this is we had some collaborators in about 2009
published in 2010 who obtained complete genome sequences from some southern
African representatives in particular we were interested in this complete genome
sequence for a represent representative of this hunter-gatherer population from
the Kalahari Desert known as the Khoisan or the Sān and the early work by
Cavalli-Sforza and others had shown from mitochondrial DNA and Y-chromosomal DNA
that the Sān seemed to be a very early branching group probably the earliest
branching group of all living populations on Earth today but the data
was was very sparse and it was limited to paternal or maternal histories so we
set out to see whether we could figure out how old this population was by using
these statistical sampling techniques across the entire genome I think I
forgot to tell you the name of our program the name of our program is G-PhoCS
stands for generalized phylogenetic coalescence sampler so we wanted to
apply G-PhoCS to these data and see what we could say about how old the Sān were
so the way we did this was we took at the time there were only a few complete
genome sequences for multiple populations across across the globe but
we had a Korean individual, a Han Chinese individual, a European individual, a West
African Yoruban individual and a Sān individual and we assumed
the following the tree that I'm showing here this was based on Cavalli-Sforza
data and other data we could also test alternative trees and make sure that
this was the one that fits the data best and we allowed for gene flow between
some of these populations and then we tried to see whether we could estimate
how old these splits were between the different groups and we focused in
particular on two splits the split between the Sān and the others that was the one
I mentioned the very old one that we're most interested in and then the split
between the west-African Yoruban's and all of the non-African populations and that
would be a proxy for the time when these non African groups migrated out of
Africa and colonized the rest of the world that would give us a pretty good
estimate of when that colonization event might have happened that's known as the
Out of Africa migration and what we came up with after after very careful
analysis for many many days was was the following estimates we estimated the age
of the Sān split to be about 200,000 years ago now that's that's really
pretty old so that's as old as Allan Wilson's estimate of mitochondrial Eve
so the Sān according to our estimates go back about as far as mitochondrial Eve
would go back that's that's actually not surprising mitochondrial Eve is the
maternal ancestor but that but for reasons I won't go into it's not too
surprising that the maternal ancestor would be close to the divergence time of
that Sān split so that was encouraging our estimate of the Out of
Africa event the African Eurasian diversions AE divergence was seventy to
eighty thousand years and that fit fairly well
with archeological findings in the Middle East and with a number of other
arguments people had made on the basis of both genetic and
archaeological evidence so we were quite encouraged by these findings
but they did indicate that the Sān are really quite an old population so note
that this this time is about three times as long ago as this time that meant the
divergence of this Sān group in southern Africa was three times as old as the
split between the West Africans and the Europeans it's a very old group there has
been some gene flow between the West Africans and the South Africans and we
can detect that in our framework but they've been remarkably isolated
probably because of this hunter-gatherer lifestyle living in the desert and their
their tendency not to mix with the farming populations nearby I just wanted
to mention very briefly that we there was a recent study that came out just
last week this is not yet published in a journal but it came out on Cold Spring
Harbor's preprint server known as bioRxiv this is a group that analyzed
some similar data to the data we analyzed but they combined it with some
ancient genomes some Iron Age farming genomes and some Stone Age hunter
gatherer genomes ranging between 300 and 2,000 years old so these were these were
remains that they dug up in South Africa obtained DNA from these remains sequence
that DNA and analyzed it together with modern-day genomes for a number of
different populations and they actually ran our program G-PhoCS on these data and
they also made use of their own method which which analyzes only pairs of
genomes together I don't want to go into all the details of their study but
they're estimating that these that this date for the split of the Sān which are
here and the other African populations might be two hundred and sixty thousand
years old or even older than that I have some questions about exactly how they
did the analysis so we'll see how that holds up
when this paper is peer-reviewed but it's reasonably consistent with ours and
it's not surprising that with the with the acquisition of this ancient DNA the
date might get pushed back even farther one other caveat I wanted to give here
without going into a lot of detail is that this molecular clock I've been
telling you about is actually kind of a fiction there actually isn't one
molecular clock there are many molecular clocks the rate at which mutations occur
varies across human individuals and it varies quite considerably between males
and females and because of the process by with the different processes by which
sperm and egg cells are generated it's age dependent in males and much less age
dependent in females what that means is that old males who become parents make a
very disproportional contribution to the numbers of mutations that occur in their
offspring that's one of the reasons why you see a paternal age of fact in
diseases like autism it's because of the higher accumulation of mutations and the
sperm cells of older males anyway I didn't want to go into all the details
here but I want to make the point that when we try to calibrate these dates
when we try to use genetic data to estimate how old populations are we're
using very crude averages over mutation rates across humans and some of these
factors have probably changed over time generation times may have changed with a
ratio of male and female ages at the time of reproduction are dependent on
the culture in which these these reproduction is occurring and so on and
so forth so that's one of the reasons why there's a lot of uncertainty about
the precise dates that we get out of these genetic analyses nonetheless we
can be fairly confident about ballpark estimates okay in a few minutes that I
have left I want to start to talk a little bit about Neanderthals and I want
to start by introducing you to Svante Pääbo who is probably the the
most famous person in the field of Neanderthal genetics he's been here at
Cold Spring Harbor many times given many talks about almost always about
Neanderthal genetics not always but almost always and Svante has been
fascinated with ancient DNA for decades and has really dedicated most of his
career as a scientist to devising new techniques for obtaining DNA from
ancient samples correcting errors in that DNA and then analyzing that DNA to
tell us something about our history I mentioned that he worked early in his
career with Allan Wilson at Berkeley later on
he moved back to Europe and for a couple of decades now I think he's had his own
Institute in Germany Leipzig, Germany Max Planck Institute where they do some of
the world's best work in this field of ancient DNA so Svante had been
studying Neanderthal DNA for a number of years and had some initial progress in
obtaining mitochondrial DNA from Neanderthals but also some setbacks
there had been some high profile cases where they had published what they
thought was Neanderthal DNA that turned out to be contaminated by modern human
DNA it's very difficult to avoid that sort of contamination and he went back
to the drawing board and came up with more rigorous techniques for obtaining
DNA and then finally in 2010 his team had a major breakthrough they were able
to obtain a so-called draft Neanderthal DNA sequence for an entire genome now at
this at this point they were not able to sequence to high coverage the genome of
a single individual they had to combine DNA from three bones that were found in
a single cave in Croatia they compared it with sample
that they had found in some other caves across Europe but by combining this
information and being very careful about DNA extraction and about sequencing and
about error correction they were able to obtain a quite good draft quality genome
for a Neanderthal and then they said about analyzing that genome and the big
story from this analysis was that there appeared to be strong evidence that
Neanderthals and modern humans had interbred probably about 60,000 years
ago I'm not going to go through all of the evidence that they presented in
favor of this hypothesis but I want to show you one finding that I think is
quite striking and and fairly easy to understand if you'll bear with me for a
moment so what we're showing here is on the
x-axis we're show we're going to take two genome sequences a European genome
sequence and an African genome sequence and we're going to compare them to the
newly sequenced Neanderthal genome on the x-axis and to the human reference
genome on the y-axis now the human reference genome is predominantly
composed of DNA from Europeans but it's not the same European as the one we're
comparing so there's still going to be quite a few differences between the DNA
the European genome that we're using as a query and this human reference genome
and now what what they do for this plot is they normalize they standardize the
distances so they have an average of one so there are some overall differences
between the European and the African and how similar they are to these two
reference genomes but they're going to get rid of that by adjusting them so
they have averages of 1 now what you see when you look across the genome is that
both the European and the African mostly have a positive slope here where they're
more where they're farther away from the neanderthal genome they're also farther
away from the human reference genome and that just reflects the fact that the
clock ticks at different rates different places across the genome so
you're accumulating mutations at different rates at different positions
across the genome and when the clock ticks faster you tend to be more distant
both from the neanderthal genome and from the human reference and when it
takes more slowly you tend to be closer to both but look at this strange anomaly
down at the left-hand side in the European genome so this is a collection
of sequences a small fraction of the entire genome but a significant fraction
a collection of positions across the genome that are very close to the
Neanderthal genome to the sequence Neanderthal genome and very far from the
human reference okay sequences that look a lot like
Neanderthal sequences but are in a European individual and don't look
anything like the reference genome that's composed of a collection of
different people so these are sort of anomalous sequences it's like alien DNA
embedded in this European genome that looks a lot like Neanderthal sequences
and not like other european sequences and it only appears in Europeans you
don't see it in Africans it's a very strange observation and if you do this
plot with other populations from outside of Africa such as East Asians or
Americans or Papua New Guineans you see the same sort of pattern a small
fraction of sites that look a lot like neanderthal DNA in humans all right so
I'm not going to show you the other analyses that they did but through a
whole series of analyses a large team of researchers very convincingly showed
that the only plausible explanation for the strange observation in non African
genomes is that non Africans interbred with Neanderthals probably
about 60,000 years ago after they had migrated off of the African continent we
know that it can't have happened in Africa because we see
no sign of it among African populations we also see no fossil record of
Neanderthals in Africa so Neanderthal the Neanderthal range was predominantly
in Europe, the Middle East, and Western Asia so it would make sense that this
band that migrated off of the African continent would have encountered
Neanderthals somewhere in Eurasia and the only way we can explain this strange
observation and a fraction of their genome as if there was an interbreeding
event okay so I want to go on with the story
so the next chapter in this story was the discovery of a new cave so the sampling
of ancient DNA that's ponte Pavel and his team were doing was very much
limited by the quality of the DNA they were able to obtain from these these
bone fragments they were analyzing many of the bone fragments that they found
that appeared to be Neanderthal bone fragments they couldn't extract any DNA
from and even the best ones were maybe one or two percent Neanderthal DNA and
mostly bacterial DNA and contamination from modern humans but then they found
this beautiful cave in Siberia and the Altai mountains called Denisova cave and
they teamed up with some Russian archaeologists and began to explore some
bones in that cave and found that they were sorry here it is it's quite far to
the east of these European Neanderthal findings probably on the eastern
side of the Neanderthal range but they found some beautiful bones in this cave
that has astronomically higher enrichments for Neanderthal DNA than
anything they had seen before so they found in particular this one very tiny
finger bone this is the distal manual phalanx so it's the tiny little
fingertip phone that had a very good DNA sample and when they
obtain the DNA from this sample they came up with the amazing finding that it
appeared not to be a Neanderthal it appeared to be another type of archaic
hominin so it was it was closer to a Neanderthal than it was to a modern
human but it was divergent enough from a Neanderthal that it must have been
hundreds of thousands of years separated from Neanderthals so they called that a
new subspecies or species the Denisova named after the cave and they also found
in the same cave a toe bone probably from the fourth or fifth toe that was
very rich in Neanderthal DNA so these two samples then became the source of
the the next several years of analysis of ancient DNA they both were high
enough quality that it was possible to obtain very high-quality complete genome
sequences for a Denisovan and for another Neanderthal from these two tiny
bones excuse me in this cave all right so I can't go through all of the
findings from the analysis of these of these two bones but I want to I want to
show you a summary of what was known in about 2013 after the analysis of the
complete genome sequences from these two bones so first of all you see there are
two distinct groups the Denisovans and the Neanderthals they are more closely
related to each other than either one is to modern humans but they're pretty
distantly related to one another they probably diverged hundreds of
thousands of years ago from one another in addition there was now evidence for
several different gene flow events there's the one that I just told you
about from a Neanderthal into these out of Africa populations represented by
this line here right here are Africans and here are non Africans modern humans
that event must have happened somewhere in the branch
leading to the non-Africans in addition they found evidence of gene flow from
the Denisovan into modern humans as well this evidence appears to be con-
confined to East Asia it's most strongly observed in oceanic populations such as
Papua New Guineans but you see some hints of it as well in Han Chinese and
Korean populations this appears to be the result of a distinct interbreeding
event between these Denisovan individuals and a group that was
probably on its way migrating along the way to Southeast
Asia in addition there was a some weak signal indicating gene flow between the
Denisovans and the Neanderthals and then perhaps most interestingly there was a
sign this remains a mystery something that we're interested in working on in
my group there remains a sign of some as yet unknown hominin possibly Homo
erectus which is a much earlier group that is known to have lived in in China
and across Eurasia that group has left some segments in the Denisovan genome
that appear very strange relative to the rest of the genome so there are short
segments in the Denisovan genome that don't look like anything else that we've
sequenced essentially and it's possible that that represents another
introgression event another interbreeding event a very old one but
that remains an open question okay so this is all background to the story that
I'm going to tell you about from my group very briefly and there this story
involved using this program that I just told you about G-PhoCS to jointly analyze
all of the data that was available at this time so we had now three
Neanderthal genome sequences the ones from the first paper, the ones from the
second paper and a partial genome that had not yet
been published from a cave in Spain we had the Denisovan genome and then we had
a series of modern humans whose genomes had been obtained I'm using the Yoruban
in here as a placeholder but we analyzed several of them together we put them
into this G-PhoCS program that samples over all of the possible evolutionary
histories that could explain the data and after some careful analysis we came
up with the following model so G-PhoCS detected evidence of essentially all of
the gene flow events that I just told you about so for example here's the gene
flow event from Neanderthals to the Out of Africa populations here's here
are the gene flow events from Denisovans to East Asians in Papua New Guineans
here is that mysterious archaic hominin that might be Homo erectus introgression
here is the introgression between the Neanderthals and the Denisovans
detected at quite low levels but in addition we found another introgression
event and no matter how we did the analysis no matter how careful we were
no matter how we subsetted the data we couldn't get this one to go away and
this one is quite interesting it's going in the opposite direction it suggests
some early modern human from before the divergence of Europeans and Africans
left its imprint in the Neanderthal genomes remember the event I told you
about earlier was in the opposite direction it was Neanderthals leaving a
footprint in Out of Africa human genomes this is humans leaving a
footprint in Neanderthal genomes but it's
shared across all humans it's not present just in the Out of Africa
populations it's you see the same signal essentially symmetrically in all modern
humans so it must date to a time before
the vergence of these human populations and it appears only in this eastern most
Neanderthal genome the altai Neanderthal genome so this is really a mystery how
can we explain this observation well here's our best guess at coming up with
a scenario that might describe it so first of all if we if we think about the
human lineage about 600,000 years ago in Africa the Neanderthals would
have branched off and they would have migrated off of the African continent
this is very early sometime later around 200,000 years ago just before these
different African groups began to split apart from one another the Sān and the
West Africans for example there must have been a group that interbred with
Neanderthals now the question is where could that have happened because we
don't think Neanderthals at that stage lived on the African continent so it
suggests maybe there was an earlier migration Out of Africa an interbreeding
event with Neanderthals perhaps in the middle east or east of the Caspian Sea
leading to that eastern most Neanderthal lineage and then who knows what
happened to that group of modern humans that group of early modern humans we
don't see any representatives of them alive today but they could have been
absorbed by the Neanderthals they could have died out completely or they could
have migrated back and become absorbed by the other African populations we
don't know we just know that we see no sign of them and then sometime later
going back to about 65,000 years or so there would have been the main
migration Out of Africa the so called Out of Africa event and subsequently the
interbreeding event that had already been discovered by Svante Pääbo and his
colleagues in the opposite direction from Neanderthals into modern humans so
this was a subject of our paper a couple of years ago there are a lot of
questions about exactly how this could have happened but the genetic evidence
is very strong that there was at least one
interbreeding event in the other direction from early modern humans into
Neanderthals okay so I apologize for going long I'm going to wrap up there
the main point I want to make is that we can take use of the we can make use of
these classical molecular phylogenetic techniques to study complete genome
sequences and reconstruct human history it's computationally expensive requires
supercomputers and very sophisticated computational models but we can do it
and we can come up with new discoveries including these ancient interbreeding
events the other point I wanted to make is that simultaneously modeling all
of the data gives us a lot of useful information so most of the previous work
published by Svante Pääbo and others has looked at subsets of the data in
isolation this finding that we were able to publish a year ago was made possible
by the fact that we we came up with a single model that had to explain all of
the data together and we could only see evidence of this early interbreeding
event in the opposite direction from early modern humans into Neanderthals
after we were accounting for all of the signals of the other migration events
with only by building a holistic model that described all of the data together
that we were able to discover that event and as I mentioned we found the first
evidence of early modern human gene flow into Neanderthals and they suggest a
likely possibility of an earlier migration Out of Africa although we have
no other evidence to support that finding other than the timing the
inferred timing of the event so finally what's next well we're very interested
in understanding that sort of phantom introgression events in the Denisovan
genome those hints of some very early introgression event possibly from Homo
erectus I have a student in my lab who's working very hard on trying to build
models that can detect those early events we're also very interested in
coming up with ways of detecting specific introgressed segments
specific segments in the human genome that have come from Neanderthals and
Denisovans and I didn't get a chance to talk about it but it's very interesting
to think about the possible disease-causing mutations that are out
there in modern human populations that may have been inherited from
Neanderthals because these Neanderthals had adapted to a different genetic
background they had adapted much earlier to the climate and conditions of
northern Europe and Asia and and those mutations that they passed to modern
humans through this introgression event some of them were probably advantageous
but some of them were probably disease-causing mutations so there's a
lot of interest in trying to understand which mutations now linked to disease
might have come in to our populations through these introgression events okay
I'm going to stop there I'd like to thank all the members of my lab who have
contributed to this work as well as our collaborators and I have a number of
funding sources over the years who've allowed us to pursue these sorts of
questions thank you very much.