Migration and interaction in a contact zone: mtDNA variation among Bantu-speakers in southern Africa

Migration and interaction in a contact zone: mtDNA variation among Bantu-speakers in southern Africa

Chiara Barbieri, Mário Vicente, Sandra Oliveira, Koen Bostoen, Jorge Rocha, Mark Stoneking, Brigitte Pakendorf

Bantu speech communities expanded over large parts of sub-Saharan Africa within the last 4000-5000 years, reaching different parts of southern Africa 1200-2000 years ago. The Bantu languages subdivide in several major branches, with languages belonging to the Eastern and Western Bantu branches spreading over large parts of Central, Eastern, and Southern Africa. There is still debate whether this linguistic divide is correlated with a genetic distinction between Eastern and Western Bantu speakers. During their expansion, Bantu speakers would have come into contact with diverse local populations, such as the Khoisan hunter-gatherers and pastoralists of southern Africa, with whom they may have intermarried. In this study, we analyze complete mtDNA genome sequences from over 900 Bantu-speaking individuals from Angola, Zambia, Namibia, and Botswana to investigate the demographic processes at play during the last stages of the Bantu expansion. Our results show that most of these Bantu-speaking populations are genetically very homogenous, with no genetic division between speakers of Eastern and Western Bantu languages. Most of the mtDNA diversity in our dataset is due to different degrees of admixture with autochthonous populations. Only the pastoralist Himba and Herero stand out due to high frequencies of particular L3f and L3d lineages; the latter are also found in the neighboring Damara, who speak a Khoisan language and were foragers and small-stock herders. In contrast, the close cultural and linguistic relatives of the Herero and Himba, the Kuvale, are genetically similar to other Bantu-speakers. Nevertheless, as demonstrated by resampling tests, the genetic divergence of Herero, Himba, and Kuvale is compatible with a common shared ancestry with high levels of drift and differential female admixture with local pre-Bantu populations.


Hierarchical Bayesian model of population structure reveals convergent adaptation to high altitude in human populations

Hierarchical Bayesian model of population structure reveals convergent adaptation to high altitude in human populations

Matthieu Foll, Oscar E. Gaggiotti, Josephine T. Daub, Laurent Excoffier
(Submitted on 18 Feb 2014)

Detecting genes involved in local adaptation is challenging and of fundamental importance in evolutionary, quantitative, and medical genetics. To this aim, a standard strategy is to perform genome scans in populations of different origins and environments, looking for genomic regions of high differentiation. Because shared population history or population sub-structure may lead to an excess of false positives, analyses are often done on multiple pairs of populations, which leads to i) a global loss of power as compared to a global analysis, and ii) the need for multiple tests corrections. In order to alleviate these problems, we introduce a new hierarchical Bayesian method to detect markers under selection that can deal with complex demographic histories, where sampled populations share part of their history. Simulations show that our approach is both more powerful and less prone to false positive loci than approaches based on separate analyses of pairs of populations or those ignoring existing complex structures. In addition, our method can identify selection occurring at different levels (i.e. population or region-specific adaptation), as well as convergent selection in different regions. We apply our approach to the analysis of a large SNP dataset from low- and high-altitude human populations from America and Asia. The simultaneous analysis of these two geographic areas allows us to identify several new candidate genome regions for altitudinal selection, and we show that convergent evolution among continents has been quite common. In addition to identifying several genes and biological processes involved in high altitude adaptation, we identify two specific biological pathways that could have evolved in both continents to counter toxic effects induced by hypoxia.

Tracing evolutionary links between species

Tracing evolutionary links between species

Mike Steel
(Submitted on 16 Feb 2014)

The idea that all life on earth traces back to a common beginning dates back at least to Charles Darwin’s {\em Origin of Species}. Ever since, biologists have tried to piece together parts of this `tree of life’ based on what we can observe today: fossils, and the evolutionary signal that is present in the genomes and phenotypes of different organisms. Mathematics has played a key role in helping transform genetic data into phylogenetic (evolutionary) trees and networks. Here, I will explain some of the central concepts and basic results in phylogenetics, which benefit from several branches of mathematics, including combinatorics, probability and algebra.

Evolutionary rates for multivariate traits: the role of selection and genetic variation

Evolutionary rates for multivariate traits: the role of selection and genetic variation

William Pitchers, Jason B. Wolf, Tom Tregenza, John Hunt, Ian Dworkin

A fundamental question in evolutionary biology is the relative importance of selection and genetic architecture in determining evolutionary rates. Adaptive evolution can be described by the multivariate breeders’ equation (Δz = Gβ ), which predicts evolutionary change for a suite of phenotypic traits (Δz ) as a product of directional selection acting on them (β) and the genetic variance-covariance matrix for those traits (G). Despite being empirically challenging to estimate, there are enough published estimates of G and β to allow for synthesis of general patterns across species. We use published estimates to test the hypotheses that there are systematic differences in the rate of evolution among trait types, and that these differences are in part due to genetic architecture. We find evidence that sexually selected traits exhibit faster rates of evolution compared to life-history or morphological traits. This difference does not appear to be related to stronger selection on sexually selected traits. Using numerous proposed approaches to quantifying the shape, size and structure of G we examine how these parameters relate to one another, and how they vary among taxonomic and trait groupings. Despite considerable variation, they do not explain the observed differences in evolutionary rates.

A Novel Approach for Multi-Domain and Multi-Gene Family Identification Provides Insights into Evolutionary Dynamics of Disease Resistance Genes in Core Eudicot Plants

A Novel Approach for Multi-Domain and Multi-Gene Family Identification Provides Insights into Evolutionary Dynamics of Disease Resistance Genes in Core Eudicot Plants

Johannes A. Hofberger, Beifei Zhou, Haibao Tang, Jonathan DG Jones, M. Eric Schranz

Recent advances in DNA sequencing techniques resulted in more than forty sequenced plant genomes representing a diverse set of taxa of agricultural, energy, medicinal and ecological importance. However, gene family curation is often only inferred from DNA sequence homology and lacks insights into evolutionary processes contributing to gene family dynamics. In a comparative genomics framework, we integrated multiple lines of evidence provided by gene synteny, sequence homology and protein-based Hidden Markov Modelling to extract homologous super-clusters composed of multi-domain resistance (R)-proteins of the NB-LRR type (for NUCLEOTIDE BINDING/LEUCINE-RICH REPEATS), that are involved in plant innate immunity. To assess the diversity of R-proteins within and between species, we screened twelve eudicot plant genomes including six major crops and found a total of 2,363 NB-LRR genes. Our curated R-proteins set shows a 50% average for tandem duplicates and a 22% fraction of gene copies retained from ancient polyploidy events (ohnologs). We provide evidence for strong positive selection acting on all identified genes and show significant differences in molecular evolution rates (Ka/Ks-ratio) among tandem- (mean=1.59), ohnolog (mean=1.36) and singleton (mean=1.22) R-gene duplicates. To foster the process of gene-edited plant breeding, we report species-specific presence/absence of all 140 NB-LRR genes present in the model plant Arabidopsis and describe four distinct clusters of NB-LRR ?gatekeeper? loci sharing syntelogs across all analyzed genomes. In summary, we designed and implemented an easy-to-follow computational framework for super-gene family identification, and provide the most curated set of NB-LRR genes whose genetic versatility among twelve lineages can underpin crop improvement.

The Fates of Mutant Lineages and the Distribution of Fitness Effects of Beneficial Mutations in Laboratory Budding Yeast Populations

The Fates of Mutant Lineages and the Distribution of Fitness Effects of Beneficial Mutations in Laboratory Budding Yeast Populations
Evgeni M. Frenkel, Benjamin H. Good, Michael M. Desai
(Submitted on 13 Feb 2014)

The outcomes of evolution are determined by which mutations occur and fix. In rapidly adapting microbial populations, this process is particularly hard to predict because lineages with different beneficial mutations often spread simultaneously and interfere with one another’s fixation. Hence to predict the fate of any individual variant, we must know the rate at which new mutations create competing lineages of higher fitness. Here, we directly measured the effect of this interference on the fates of specific adaptive variants in laboratory Saccharomyces cerevisiae populations and used these measurements to infer the distribution of fitness effects of new beneficial mutations. To do so, we seeded marked lineages with different fitness advantages into replicate populations and tracked their subsequent frequencies for hundreds of generations. Our results illustrate the transition between strongly advantageous lineages which decisively sweep to fixation and more moderately advantageous lineages that are often outcompeted by new mutations arising during the course of the experiment. We developed an approximate likelihood framework to compare our data to simulations and found that the effects of these competing beneficial mutations were best approximated by an exponential distribution, rather than one with a single effect size. We then used this inferred distribution of fitness effects to predict the rate of adaptation in a set of independent control populations. Finally, we discuss how our experimental design can serve as a screen for rare, large-effect beneficial mutations.

Cell specific eQTL analysis without sorting cells

Cell specific eQTL analysis without sorting cells

Harm-Jan Westra, Danny Arends, Tõnu Esko, Marjolein J. Peters, Claudia Schurmann, Katharina Schramm, Johannes Kettunen, Hanieh Yaghootkar, Benjamin Fairfax, Anand Kumar Andiappan, Yang Li, Jingyuan Fu, Juha Karjalainen, Mathieu Platteel, Marijn Visschedijk, Rinse Weersma, Silva Kasela, Lili Milani, Liina Tserel, Pärt Peterson, Eva Reinmaa, Albert Hofman, André G. Uitterlinden, Fernando Rivadeneira, Georg Homuth, Astrid Petersmann, Roberto Lorbeer, Holger Prokisch, Thomas Meitinger, Christian Herder, Michael Roden, Harald Grallert, Samuli Ripatti, Markus Perola, Adrew R. Wood, David Melzer, Luigi Ferrucci, Andrew B. Singleton, Dena G. Hernandez, Julian C. Knight, Rossella Melchiotti, Bernett Lee, Michael Poidinger, Francesca Zolezzi, Anis Larbi, De Yun Wang, Leonard H. van den Berg, Jan H. Veldink, Olaf Rotzschke, Seiko Makino, Timouthy Frayling, Veikko Salomaa, Konstantin Strauch, Uwe Völker, Joyce B.J. van Meurs, Andres Metspalu, Cisca Wijmenga, Ritsert C. Jansen, Lude Franke

Expression quantitative trait locus (eQTL) mapping on tissue, organ or whole organism data can detect associations that are generic across cell types. We describe a new method to focus upon specific cell types without first needing to sort cells. We applied the method to whole blood data from 5,683 samples and demonstrate that SNPs associated with Crohn’s disease preferentially affect gene expression within neutrophils.

Multiple Quantitative Trait Analysis Using Bayesian Networks

Multiple Quantitative Trait Analysis Using Bayesian Networks

Marco Scutari, Phil Howell, David J. Balding, Ian Mackay
(Submitted on 12 Feb 2014)

Models for genome-wide prediction and association studies usually target a single phenotypic trait. However, in animal and plant genetics it is common to record information on multiple phenotypes for each individual that will be genotyped. Modeling traits individually disregards the fact that they are most likely associated due to pleiotropy and shared biological basis, thus providing only a partial, confounded view of genetic effects and phenotypic interactions. In this paper we use data from a Multiparent Advanced Generation Inter-Cross (MAGIC) winter wheat population to explore Bayesian networks as a convenient and interpretable framework for the simultaneous modeling of multiple quantitative traits. We show that they are equivalent to multivariate genetic best linear unbiased prediction (GBLUP), and that they outperform single-trait elastic net and single-trait GBLUP in predictive performance. Finally, we discuss their relationship with other additive-effects models and their advantages in inference and interpretation. MAGIC populations provide an ideal setting for this kind of investigation because the very low population structure and large sample size result in predictive models with good power and limited confounding due to relatedness.

Can one hear the shape of a population history?

Can one hear the shape of a population history?
Junhyong Kim, Elchanan Mossel, Miklós Z. Rácz, Nathan Ross
(Submitted on 11 Feb 2014)

Reconstructing past population size from present day genetic data is a major goal of population genetics. Recent empirical studies infer population size history using coalescent-based models applied to a small number of individuals. While it is known that the allelic spectrum is not sufficient to infer the population size history, the distribution of coalescence times is. Here we provide tight bounds on the amount of information needed to recover the population size history at a certain level of accuracy assuming data given either by exact coalescence times, or given blocks of non-recombinant DNA sequences whose loci have approximately equal times to coalescence. Importantly, we prove lower bounds showing that it is impossible to accurately deduce population histories given limited data.

Estimating the evolution of human life history traits in age-structured populations

Estimating the evolution of human life history traits in age-structured populations
Ryan Baldini

I propose a method that estimates the selection response of all vital rates in an age-structured population. I assume that vital rates are determined by the additive genetic contributions of many loci. The method uses all relatedness information in the sample to inform its estimates of genetic parameters, via an MCMC Bayesian framework. One can use the results to estimate the selection response of any life history trait that is a function of the vital rates, including the age at first reproduction, total lifetime fertility, survival to adulthood, and others. This method closely ties the empirical analysis of life history evolution to dynamically complete models of natural selection, and therefore enjoys some theoretical advantages over other methods. I demonstrate the method on a simulated model of evolution with two age classes. Finally I discuss how the method can be extended to more complicated cases.