Our paper: The genetic prehistory of southern Africa

[This author post is by Joe Pickrell (@joe_pickrell), Nick Patterson, Mark Stoneking, David Reich, and Brigitte Pakendorf on The genetic prehistory of southern Africa, available from arXiv here]

The indigenous populations of southern Africa are phenotypically, linguistically, culturally, and genetically diverse. Although many groups speak Bantu languages (having arrived in the region during an expansion of Iron-Age agriculturalists), there are a number of populations who speak diverse non-Bantu languages with heavy use of click consonants. We refer to these populations as “Khoisan“. Most of the Khoisan populations are hunter-gatherers, but some are pastoralists; the extensive linguistic and cultural diversity of the Khoisan (who live in a relatively small region around the Kalahari semi-desert) is historically puzzling.

Two hunter-gatherer (or formerly hunter-gatherer) populations in East Africa, the Hadza and Sandawe, also speak languages that also make use of click consonants. Linguists see little in common between the languages in southern Africa and Hadza, although Sandawe might be genealogically related to some of the Khoisan languages. Nevertheless, the shared use of click consonants and a foraging lifestyle led many to hypothesize that the southern African Khoisan populations are genetically related to the Hadza and Sandawe, which would imply that their ancestors were once considerably more widespread. This hypothesis has been controversial for decades.

Tree relating the Khoisan-like proportion of ancestry (shown in blue in the barplot) in Khoisan, Hadza, and Sandawe after accounting for non-Khoisan admixture.

In our study, we use genetic data to address the history of the diverse groups within southern Africa and their relationship to the Hadza and Sandawe. Specifically, we genotyped individuals from 16 Khoisan populations, 5 neighboring populations that speak Bantu languages, and the Hadza (the latter thanks to Brenna Henn, Joanna Mountain, and Carlos Bustamante) on a SNP array designed for studies of human history, in that the SNP ascertainement scheme is known and includes SNPs ascertained in the Khoisan. We then merged in Hadza and Sandawe samples from a recent paper by Joseph Lachance, Sarah Tishkoff and colleagues. The main conclusions are as follows:

  1. Within the southern African Khoisan, there are two genetic groups, which correspond roughly to populations in the northwest and southeast Kalahari semi-desert. Populations from these two groups have been labeled in the tree in this post (see also Figure 1B in the preprint). We estimate that these two groups diverged within the last 30,000 years. However, this date should be taken as an upper bound due to point #2 below.
  2. All southern African Khoisan groups are admixed with non-Khoisan populations. Even the most isolated Khoisan groups (i.e. the “San” from the HGDP, who are included in the “Ju|’hoan_North” group in our paper) show some evidence of admixture with agricultualist and/or pastoralist groups. A subtle technical point is that this had not been previously noticed because methods that rely on correlations in allele frequencies are sometimes unable to detect admixture if all populations are admixed (this is related to Mr. Razib Khan’s post on why ADMIXTURE is not a test for admixure). To get around this, we developed new methods based on the decay of linkage disequilibrum.
  3. The Hadza and Sandawe trace part of their ancestry to admixture with a population related to the Khoisan. After accounting for admixture, we built a tree of “Khoisan-like” ancestry in the southern and eastern African populations (see the Figure above). The striking thing is that the Hadza and Sandawe fall with high confidence on the same branch as the Khoisan. This suggests that, prior to subsequent migrations of food-producing peoples over most of sub-Saharan Africa, populations related to the Khoisan were indeed spread continuously over a huge geographic range including Tanzania and southern Africa.

We’re excited about these results for a number of reasons. First of all, we’re now on our way towards understanding the history of the diverse Khoisan populations–for years these populations have been treated as genetically equivalent, but it’s clear that each population has its own complex history. Secondly, with the new statistical methods we’ve developed we were able to show not only the varying amounts of admixture that has occurred at different times in southern African populations, but were also able to peel away these layers of admixture to learn about the relationships among Khoisan populations that existed thousands of years ago. Finally, we think that these results have important implications for work using genetics to understand the geographic origin of modern humans within Africa. Though both southern and eastern Africa have been proposed as potential origins, from the tree in this post, we see no genetic evidence in favor of either; from our point of view this question remains open.

Joe Pickrell, Nick Patterson, Mark Stoneking, David Reich, and Brigitte Pakendorf

17 thoughts on “Our paper: The genetic prehistory of southern Africa

  1. Very cool, these populations clearly need deeper study. So, are there language trees for these groups? I think it’s really interesting that the ! click may have only evolved once.

    • For the southern Africans, see Supplementary Figure 1 for the relationships between the languages. The Hadza and Sandawe languages are considered isolates (though there’s some talk of the Sandawe language having a relationship to the Khoe languages). The precise linguistic details here are bit beyond me; I’l see if I can get one of my linguist colleagues to give further details.

    • As Joe pointed out, we’ve got basic trees for the 3 different Khoisan language families in the Supplementary Materials. A slightly more detailed overview over the genealogies can be found in Güldemann 2008 (Southern African Humanities 20: 93-132, his Table 4).
      As to the ! click evolving only once: I’m not entirely sure whether you specifically think that the alveolar click (represented by the ! symbol) evolved only once, or that all click consonants evolved only once. There are there are very many different click consonants – most of the Khoisan languages of southern Africa, for example have (way) over 30.
      Secondly, while it’s true that they are very restricted areally, it is not the case that all languages making use of clicks inherited these sounds from a common ancestor. There are several cases in Africa of unrelated languages (southeastern and southwestern Bantu languages in southern Africa as well as a Cushitic language in East Africa) having borrowed clicks through contact with peoples who spoke languages with clicks.
      Thirdly, there is one clearly attested case of completely independent innovation of click phonemes: in a language variant called Damin in Australia.
      For more information (written for a non-linguistic, non-Khoisanist audience) I would recommend Güldemann & Stoneking (2008: Annu. Rev. Anthropol. 2008. 37:93–109).

      • Yes, they’re different phonemes.
        How to count them is a rather complex issue – if you’re interested in the gory details I would rather refer you to the specialists (e.g. by passing your question on to my Khoisanist colleagues).

  2. Dr. Pickrell,

    There was an issue in the *Genetics* paper where ROLLOFF was predicting a very recent admixture for the Uyghur. This is historically defensible (short answer: the Uyghurs who are labelled Uyghurs today may not have a direction genealogical connection to the Uyghurs of the 8th century, as many of the Turkic “ethnic groups” between the Caspian and Gansu were ethnicized only within the past few centuries), though is not the consensus in the historical literature. Now I’m rather curious about these short time frames that ROLLOFF and other related methods are giving for admixture in East Africa. And some of the abstracts in the conferences are suggesting the same for India.

    Obviously you assume that the methods are giving you reasonably accurate results. But, if they were to err would you think they would under or overestimate the time since admixture? I haven’t used these methods enough to have any intuition, but though I can accept the low dates inferred in a given case, I’m having an issue with accepting their joint plausibilities (perhaps this is some cognitive bias, I don’t know).

    • Yes, I think the flow into east Africa is fascinating. There have been a couple papers showing clear evidence of “west Eurasian” gene flow into east Africa sometime around 1000BC-0AD, but we have *absolutely no idea* who these people were (I saw your post about Punt; that was about all I could come up with too!).

      The ROLLOFF dates come from an idealized model where gene flow happens in a single pulse. Getting beyond this is hard (see the Ralph and Coop paper for a good discussion). My experience is that if there are multiple waves of mixture, it will tend to pick up the most recent one. So while the dates we’re getting in east Africa are probably dates of some important things, older flow is definitely plausible; we see this for some of the Khoisan groups.

  3. Pingback: The brambly bush of humanity | Gene Expression | Discover Magazine

  4. There was an issue in the *Genetics* paper where ROLLOFF was predicting a very recent admixture for the Uyghur. This is historically defensible (short answer: the Uyghurs who are labelled Uyghurs today may not have a direction genealogical connection to the Uyghurs of the 8th century, as many of the Turkic “ethnic groups” between the Caspian and Gansu were ethnicized only within the past few centuries), though is not the consensus in the historical literature.

    It is not historically defensible. Because, whether or not modern-day Uyghurs have any direct genealogical connection to the Uyghurs of the Middle Ages (we know that there is no such connection in terms of identity if not in terms of genetics, as the modern-day “Uyghur” identity is a 20th century creation), it is clear from the historical records that the region inhabited by modern-day Uyghurs (=the Tarim basin) was already Turkic-speaking during the centuries before the Mongol conquests (see, for instance, the information provided by the 11th century geographer and linguist of the Turkic-speaking regions Mahmud al-Kashgari, who is a Turkic person from the Tarim basin BTW). Tocharian languages went extinct and the Tarim basin was Turkicized several centuries before the Mongol conquests.

    • Razib, Onur, I briefly followed your back and forth on this over at GNXP. Please keep it over there; this is obviously off topic for this post.

  5. I have a few trivial questions about samples:

    1. What on earth are the |=hoan? Is the linguistic affiliation solid? The don’t seem to come out right. Or is there an appendix somewhere that I did not find?

    2. Where is the Damara sample from? Are they the same as the folks called Bergdama in Namibia?

    Thanks, Henry Harpending

    • 1) We do have fairly detailed information on the samples in the Supplementary Materials, but we don’t discuss the ǂHoan.
      They are a small group from Botswana. The affiliation of their language has indeed raised some problems, with a debate in the early 1970s between Anthony Traill and E.O.J. Westphal as to whether it belongs to the Southern Khoisan (now: Tuu) or Northern Khoisan (now: Ju branch of Kx’a) languages. It was only recently affiliated with the Ju languages in the newly named Kx’a family by Heine & Honken (2010). The difficulties with classifying this language are most probably due to the contact influence it has undergone, as evidenced by lexical borrowings (Traill & Nakagawa, 2000).

      2) The Damara are indeed the people speaking dialects of Khoekhoegowab formerly referred to as Bergdamara, i.e. not Herero. The samples were collected in various locations in central and western Namibia.

  6. I am merely an interested layman, so forgive me, but the above figure seems to imply that chimpanzees are our ancestors. Is it conventional in the field to use “chimp” as a shorthand for the pan-homo divergence? Hopefully I am not completely misunderstanding the meaning. Thanks.

    • The chimpanzee is used to root the tree. The actual structure (hidden from view in this figure) is that there’s a common ancestor of humans and chimpanzees; the chimpanzees are on one branch and the humans (shown) are the other branch. See, for example, Figure S20 in the preprint, for a tree with the complete structure.

      • Thank you. I hadn’t seen figure S20, or the preprint, but in light of these things the above figure makes much more sense to me. Next time I’ll try to find these things myself before posting a lame question. Thank again.

  7. Pingback: Most viewed on Haldane’s Sieve: August-September 2012 | Haldane's Sieve

  8. Pingback: Haldane’s Sieve sifts through 2012 | Haldane's Sieve

Leave a comment