Ancient west Eurasian ancestry in southern and eastern Africa

Ancient west Eurasian ancestry in southern and eastern Africa
Joseph K. Pickrell, Nick Patterson, Po-Ru Loh, Mark Lipson, Bonnie Berger, Mark Stoneking, Brigitte Pakendorf, David Reich
(Submitted on 30 Jul 2013)

The history of southern Africa involved interactions between indigenous hunter-gatherers and a range of populations that moved into the region. Here we use genome-wide genetic data to show that there are at least two admixture events in the history of Khoisan populations (southern African hunter-gatherers and pastoralists who speak non-Bantu languages with click consonants). One involved populations related to Niger-Congo-speaking African populations, and the other introduced ancestry most closely related to west Eurasian (European or Middle Eastern) populations. We date this latter admixture event to approximately 900-1,800 years ago, and show that it had the largest demographic impact in Khoisan populations that speak Khoe-Kwadi languages. A similar signal of west Eurasian ancestry is present throughout eastern Africa. In particular, we also find evidence for two admixture events in the history of Kenyan, Tanzanian, and Ethiopian populations, the earlier of which involved populations related to west Eurasians and which we date to approximately 2,700 – 3,300 years ago. We reconstruct the allele frequencies of the putative west Eurasian population in eastern Africa, and show that this population is a good proxy for the west Eurasian ancestry in southern Africa. The most parsimonious explanation for these findings is that west Eurasian ancestry entered southern Africa indirectly through eastern Africa.

Identifiability of a Coalescent-based Population Tree Model

Identifiability of a Coalescent-based Population Tree Model
Arindam RoyChoudhury
(Submitted on 12 Apr 2013)

Identifiability of evolutionary tree models has been a recent topic of discussion and some models have been shown to be non-identifiable. A coalescent-based rooted population tree model, originally proposed by Nielsen et al. 1998 [2], has been used by many authors in the last few years and is a simple tool to accurately model the changes in allele frequencies in the tree. However, the identifiability of this model has never been proven. Here we prove this model to be identifiable by showing that the model parameters can be expressed as functions of the probability distributions of subsamples. This a step toward proving the consistency of the maximum likelihood estimator of the population tree based on this model.

Efficient moment-based inference of admixture parameters and sources of gene flow

Efficient moment-based inference of admixture parameters and sources of gene flow
Mark Lipson, Po-Ru Loh, Alex Levin, David Reich, Nick Patterson, Bonnie Berger
(Submitted on 11 Dec 2012)

The recent explosion in available genetic data has led to significant advances in understanding the demographic histories of and relationships among human populations. It is still a challenge, however, to infer reliable parameter values for complicated models involving many populations. Here we present MixMapper, an efficient, interactive method for constructing phylogenetic trees including admixture events using single nucleotide polymorphism (SNP) genotype data. MixMapper implements a novel two-phase approach to admixture inference using moment statistics, first building an unadmixed scaffold tree and then adding admixed populations by solving systems of equations that express allele frequency divergences in terms of mixture parameters. Importantly, all features of the tree, including topology, sources of gene flow, branch lengths, and mixture proportions, are optimized automatically from the data and include estimates of statistical uncertainty. MixMapper also uses a new method to express branch lengths in easily interpretable drift units. We apply MixMapper to recently published data for HGDP individuals genotyped on a SNP array designed especially for use in population genetics studies, obtaining confident results for 30 populations, 20 of them admixed. Notably, we confirm a signal of ancient admixture in European populations—including previously undetected admixture in Sardinians and Basques—involving a proportion of 20-40% ancient northern Eurasian ancestry.

Reconstructing Roma history from genome-wide data

Reconstructing Roma history from genome-wide data

Priya Moorjani, Nick Patterson, Po-Ru Loh, Mark Lipson, Péter Kisfali, Bela I Melegh, Michael Bonin, Ľudevít Kádaši, Olaf Rieß, Bonnie Berger, David Reich, Béla Melegh
(Submitted on 7 Dec 2012)

The Roma people, living throughout Europe, are a diverse population linked by the Romani language and culture. Previous linguistic and genetic studies have suggested that the Roma migrated into Europe from South Asia about 1000-1500 years ago. Genetic inferences about Roma history have mostly focused on the Y chromosome and mitochondrial DNA. To explore what additional information can be learned from genome-wide data, we analyzed data from six Roma groups that we genotyped at hundreds of thousands of single nucleotide polymorphisms (SNPs). We estimate that the Roma harbor about 80% West Eurasian ancestry-deriving from a combination of European and South Asian sources- and that the date of admixture of South Asian and European ancestry was about 850 years ago. We provide evidence for Eastern Europe being a major source of European ancestry, and North-west India being a major source of the South Asian ancestry in the Roma. By computing allele sharing as a measure of linkage disequilibrium, we estimate that the migration of Roma out of the Indian subcontinent was accompanied by a severe founder event, which we hypothesize was followed by a major demographic expansion once the population arrived in Europe.

Our paper: The genetic prehistory of southern Africa

[This author post is by Joe Pickrell (@joe_pickrell), Nick Patterson, Mark Stoneking, David Reich, and Brigitte Pakendorf on The genetic prehistory of southern Africa, available from arXiv here]

The indigenous populations of southern Africa are phenotypically, linguistically, culturally, and genetically diverse. Although many groups speak Bantu languages (having arrived in the region during an expansion of Iron-Age agriculturalists), there are a number of populations who speak diverse non-Bantu languages with heavy use of click consonants. We refer to these populations as “Khoisan“. Most of the Khoisan populations are hunter-gatherers, but some are pastoralists; the extensive linguistic and cultural diversity of the Khoisan (who live in a relatively small region around the Kalahari semi-desert) is historically puzzling.

Two hunter-gatherer (or formerly hunter-gatherer) populations in East Africa, the Hadza and Sandawe, also speak languages that also make use of click consonants. Linguists see little in common between the languages in southern Africa and Hadza, although Sandawe might be genealogically related to some of the Khoisan languages. Nevertheless, the shared use of click consonants and a foraging lifestyle led many to hypothesize that the southern African Khoisan populations are genetically related to the Hadza and Sandawe, which would imply that their ancestors were once considerably more widespread. This hypothesis has been controversial for decades.

Tree relating the Khoisan-like proportion of ancestry (shown in blue in the barplot) in Khoisan, Hadza, and Sandawe after accounting for non-Khoisan admixture.

In our study, we use genetic data to address the history of the diverse groups within southern Africa and their relationship to the Hadza and Sandawe. Specifically, we genotyped individuals from 16 Khoisan populations, 5 neighboring populations that speak Bantu languages, and the Hadza (the latter thanks to Brenna Henn, Joanna Mountain, and Carlos Bustamante) on a SNP array designed for studies of human history, in that the SNP ascertainement scheme is known and includes SNPs ascertained in the Khoisan. We then merged in Hadza and Sandawe samples from a recent paper by Joseph Lachance, Sarah Tishkoff and colleagues. The main conclusions are as follows:

  1. Within the southern African Khoisan, there are two genetic groups, which correspond roughly to populations in the northwest and southeast Kalahari semi-desert. Populations from these two groups have been labeled in the tree in this post (see also Figure 1B in the preprint). We estimate that these two groups diverged within the last 30,000 years. However, this date should be taken as an upper bound due to point #2 below.
  2. All southern African Khoisan groups are admixed with non-Khoisan populations. Even the most isolated Khoisan groups (i.e. the “San” from the HGDP, who are included in the “Ju|’hoan_North” group in our paper) show some evidence of admixture with agricultualist and/or pastoralist groups. A subtle technical point is that this had not been previously noticed because methods that rely on correlations in allele frequencies are sometimes unable to detect admixture if all populations are admixed (this is related to Mr. Razib Khan’s post on why ADMIXTURE is not a test for admixure). To get around this, we developed new methods based on the decay of linkage disequilibrum.
  3. The Hadza and Sandawe trace part of their ancestry to admixture with a population related to the Khoisan. After accounting for admixture, we built a tree of “Khoisan-like” ancestry in the southern and eastern African populations (see the Figure above). The striking thing is that the Hadza and Sandawe fall with high confidence on the same branch as the Khoisan. This suggests that, prior to subsequent migrations of food-producing peoples over most of sub-Saharan Africa, populations related to the Khoisan were indeed spread continuously over a huge geographic range including Tanzania and southern Africa.

We’re excited about these results for a number of reasons. First of all, we’re now on our way towards understanding the history of the diverse Khoisan populations–for years these populations have been treated as genetically equivalent, but it’s clear that each population has its own complex history. Secondly, with the new statistical methods we’ve developed we were able to show not only the varying amounts of admixture that has occurred at different times in southern African populations, but were also able to peel away these layers of admixture to learn about the relationships among Khoisan populations that existed thousands of years ago. Finally, we think that these results have important implications for work using genetics to understand the geographic origin of modern humans within Africa. Though both southern and eastern Africa have been proposed as potential origins, from the tree in this post, we see no genetic evidence in favor of either; from our point of view this question remains open.

Joe Pickrell, Nick Patterson, Mark Stoneking, David Reich, and Brigitte Pakendorf

The geography of recent genetic ancestry across Europe

The geography of recent genetic ancestry across Europe

Peter Ralph, Graham Coop
(Submitted on 16 Jul 2012 (v1), last revised 19 Jul 2012 (this version, v2))

The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (the POPRES dataset) to conduct one of the first surveys of recent genealogical ancestry over the past three thousand years at a continental scale. We detected 1.9 million shared genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 10-50 genetic common ancestors from the last 1500 years, and upwards of 500 genetic ancestors from the previous 1000 years. These numbers drop off exponentially with geographic distance, but since genetic ancestry is rare, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1000 years. There is substantial regional variation in the number of shared genetic ancestors: especially high numbers of common ancestors between many eastern populations likely date to the Slavic and/or Hunnic expansions, while much lower levels of common ancestry in the Italian and Iberian peninsulas may indicate weaker demographic effects of Germanic expansions into these areas and/or more stably structured populations. Recent shared ancestry in modern Europeans is ubiquitous, and clearly shows the impact of both small-scale migration and large historical events. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world.

Our paper: Blood ties: ABO is a trans-species polymorphism in primates

[This author post is by Laure Ségurel [a postdoc in the Przeworski Lab] on the paper Blood ties: ABO is a trans-species polymorphism in primates, posted on the arXiv here]

The mysteries of the ABO blood group were first brought to our attention by Carole Ober. When we started working on it, we were mostly surprised by how little was known about the function of such a heavily studied gene and such an important clinical phenotype. Indeed, the expression of A, B and/or O antigens at the surface of some cells is a polymorphic phenotype shared by species as diverse as macaques and baboons in Africa, gibbons in Asia, squirrel monkeys in the Americas and, of course, humans throughout most of the world yet, many questions remain unanswered, such as: What is the biological role of ABO in different cell types? Why did Hominoids evolve toward its expression of blood cells whereas other primates express it only on epithelial/endothelial cells? Why is the O allele at such high frequency only in humans? What are the selective agents responsible for the maintenance of this polymorphism? And why did chimpanzees and bonobos apparently loose the polymorphism?

One question that we became interested in answering with population genetic tools was that of the origin of such blood types. When did the genetic polymorphism first emerge and which species share it identical by descent (as opposed to by convergent evolution)? Answers to these questions could tell us where and when having multiple alleles at this locus became advantageous. We therefore sequenced as many Hominoids, Old World monkeys and New World monkeys we could get our hands on, and, even more interestingly, we started thinking about the expectations under a model of convergent evolution, i.e., one where the AB genetic polymorphism was created independently multiple times in different species (and then maintained by balancing selection in these lineages) versus under a model of trans-species polymorphism, i.e., in which the AB genetic polymorphism arose early in time and was transmitted identical by descent to distinct species. Key to distinguishing the two predictions is the age of different selected alleles within a polymorphic population.

We therefore compared alleles within humans, orangutans, gibbons, macaques, baboons and colobus monkeys (all polymorphic species for the A and B alleles), and showed that, at least among Hominoids and among Old World monkeys, the observed genetic pattern is not compatible with a model of convergent evolution but on the contrary matches the expectations under a model of a trans-species polymorphism maintained by multi-allelic balancing selection. In other words, the data indicate that the AB polymorphism was present at least around 20 Millions of years ago, if not earlier. Also, interestingly, it seems that the A, B and O functional classes do not provide a complete description of the allelic classes natural selection is acting on, which underscores the need for more detailed functional studies of ABO sub-groups.

By submitting the paper to arXiv, we hope to circulate it to a diverse audience and without delay. In particular, we hope that the study will motivate more experimental/functional work about the role of this polymorphism in immune response, e.g., to pathogen infections.

Laure Ségurel