Response to a population bottleneck can be used to infer recessive selection

Response to a population bottleneck can be used to infer recessive selection
Daniel J. Balick, Ron Do, David Reich, Shamil R. Sunyaev
(Submitted on 11 Dec 2013)

Here we present the first genome wide statistical test for recessive selection. This test uses explicitly non-equilibrium demographic differences between populations to infer the mode of selection. By analyzing the transient response to a population bottleneck and subsequent re-expansion, we qualitatively distinguish between alleles under additive and recessive selection. We analyze the response of the average number of deleterious mutations per haploid individual and describe time dependence of this quantity. We introduce a statistic, BR, to compare the number of mutations in different populations and detail its functional dependence on the strength of selection and the intensity of the population bottleneck. This test can be used to detect the predominant mode of selection on the genome wide or regional level, as well as among a sufficiently large set of medically or functionally relevant alleles.

Evaluating the use of ABBA-BABA statistics to locate introgressed loci

Evaluating the use of ABBA-BABA statistics to locate introgressed loci
Simon Henry Martin, John William Davey, Chris D Jiggins

Several methods have been proposed to test for introgression across genomes. One method identifies an excess of shared derived alleles between taxa using Patterson’s D statistic, but does not establish which loci show such an excess or whether the excess is due to introgression or ancestral population structure. Smith and Kronforst (2013) propose that, at loci identified as outliers for the D statistic, introgression is indicated by a reduction in absolute genetic divergence (dXY) between taxa with shared ancestry, whereas ancestral structure produces no reduction in dXY at these loci. Here, we use simulations and Heliconius butterfly data to investigate the behavior of D when applied to small genomic regions. We find that D imperfectly identifies loci with shared ancestry in many scenarios due to a bias in regions with few segregating sites. A related statistic, f, is mostly robust to this bias but becomes less accurate as gene flow becomes more ancient. Although reduced dXY does indicate introgression when loci with shared ancestry can be accurately detected, both D and f systematically identify regions of lower dXY in the presence of both gene flow and ancestral structure, so detecting a reduction in dXY at D or f outliers is not sufficient to infer introgression. However, models including gene flow produced a larger reduction in dXY than models including ancestral structure in almost all cases, so this reduction may be suggestive, but not conclusive, evidence for introgression.

Genome-wide scan of 29,141 African Americans finds no evidence of selection since admixture

Genome-wide scan of 29,141 African Americans finds no evidence of selection since admixture
Gaurav Bhatia, Arti Tandon, Melinda C. Aldrich, Christine B. Ambrosone, Christopher Amos, Elisa V. Bandera, Sonja I. Berndt, Leslie Bernstein, William J. Blot, Cathryn H. Bock, Neil Caporaso, Graham Casey, Sandra L. Deming, W. Ryan Diver, Susan M. Gapstur, Elizabeth M. Gillanders, Curtis C. Harris, Brian E. Henderson, Sue A. Ingles, William Isaacs, Esther M. John, Rick A. Kittles, Emma Larkin, Lorna H. McNeill, Robert C. Millikan, Adam Murphy, Christine Neslund-Dudas, Sarah Nyante, Michael F. Press, Jorge L. Rodriguez-Gil, Benjamin A. Rybicki, Ann G. Schwartz, Lisa B. Signorello, Margaret Spitz, Sara S. Strom, Margaret A. Tucker, John K. Wiencke, John S. Witte, Xifeng Wu, Yuko Yamamura, Krista A. Zanetti, Wei Zheng, Regina G. Ziegler, Stephen J. Chanock, Christopher A. Haiman, David Reich, Alkes L. Price
(Submitted on 10 Dec 2013)

We scanned through the genomes of 29,141 African Americans, searching for loci where the average proportion of African ancestry deviates significantly from the genome-wide average. We failed to find any genome-wide significant deviations, and conclude that any selection in African Americans since admixture is sufficiently weak that it falls below the threshold of our power to detect it using a large sample size. These results stand in contrast to the findings of a recent study of selection in African Americans. That study, which had 15 times fewer samples, reported six loci with significant deviations. We show that the discrepancy is likely due to insufficient correction for multiple hypothesis testing in the previous study. The same study reported 14 loci that showed greater population differentiation between African Americans and Nigerian Yoruba than would be expected in the absence of natural selection. Four such loci were previously shown to be genome-wide significant and likely to be affected by selection, but we show that most of the 10 additional loci are likely to be false positives. Additionally, the most parsimonious explanation for the loci that have significant evidence of unusual differentiation in frequency between Nigerians and Africans Americans is selection in Africa prior to their forced migration to the Americas.

Probabilistic Graphical Model Representation in Phylogenetics

Probabilistic Graphical Model Representation in Phylogenetics
Sebastian Höhna, Tracy A. Heath, Bastien Boussau, Michael J. Landis, Fredrik Ronquist, John P. Huelsenbeck
(Submitted on 9 Dec 2013)

Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (1) reproducibility of an analysis, (2) model development and (3) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and non-specialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies.
Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference.
Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis-Hastings or Gibbs sampling of the posterior distribution.

Probabilistic models of genetic variation in structured populations applied to global human studies

Probabilistic models of genetic variation in structured populations applied to global human studies
Wei Hao, Minsun Song, John D. Storey
(Submitted on 7 Dec 2013)

Modern population genetics studies typically involve genome-wide genotyping of individuals from a diverse network of ancestries. An important, unsolved problem is how to formulate and estimate probabilistic models of observed genotypes that allow for complex population structure. We formulate two general probabilistic models, and we propose computationally efficient algorithms to estimate them. First, we show how principal component analysis (PCA) can be utilized to estimate a general model that includes the well-known Pritchard-Stephens-Donnelly mixed-membership model as a special case. Noting some drawbacks of this approach, we introduce a new “logistic factor analysis” (LFA) framework that seeks to directly model the logit transformation of probabilities underlying observed genotypes in terms of latent variables that capture population structure. We demonstrate these advances on data from the human genome diversity panel and 1000 genomes project, where we are able to identify SNPs that are highly differentiated with respect to structure while making minimal modeling assumptions.

Human blood genotypes dynamics

Human blood genotypes dynamics
Timur Sadykov
(Submitted on 9 Dec 2013)

We give a complete closed form description of the evolution of human blood genotypes frequencies (in the ABO and Rh classification) after any (finite or infinite) number of generations and for any initial distribution.

The time-dependent reconstructed evolutionary process with a key-role for mass-extinction events

The time-dependent reconstructed evolutionary process with a key-role for mass-extinction events
Sebastian Höhna
(Submitted on 9 Dec 2013)

The homogeneous reconstructed evolutionary process is a birth-death process without observed extinct lineages. Each species evolves independently with the same diversification rates (speciation rate λ(t) and extinction rate μ(t)) that may change over time. The process is commonly applied to model species diversification where the data are reconstructed phylogenies, e.g., trees reconstructed from present-day molecular data, and used to infer diversification rates.
In the present paper I develop the general probability density of a reconstructed tree under any time-dependent birth-death process. I elaborate on how to adapt this probability density if conditioned on survival of one or two initial lineages, or having sampled n species and show how to transform between the probability density of a reconstructed and the probability density of the speciation times.
I demonstrate the use of the general time-dependent probability density functions by deriving the probability density of a reconstructed tree under a birth-death-shift model with explicit mass-extinction events. I enrich this compendium by providing and discussing several special cases, including: the pure birth process, the pure death process, the birth-death process and the critical branching process. Thus, I provide here most of the commonly used birth-death models in a unified framework (e.g., same condition and same data) with common notation.

Species Delimitation using Genome-Wide SNP Data

Species Delimitation using Genome-Wide SNP Data

Adam Leache, Matthew Fujita, Vladimir Minin, Remco Bouckaert

The multi-species coalescent has provided important progress for evolutionary inferences, including increasing the statistical rigor and objectivity of comparisons among competing species delimitation models. However, Bayesian species delimitation methods typically require brute force integration over gene trees via Markov chain Monte Carlo (MCMC), which introduces a large computation burden and precludes their application to genomic-scale data. Here we combine a recently introduced dynamic programming algorithm for estimating species trees that bypasses MCMC integration over gene trees with sophisticated methods for estimating marginal likelihoods, needed for Bayesian model selection, to provide a rigorous and computationally tractable technique for genome-wide species delimitation. We provide a critical yet simple correction that brings the likelihoods of different species trees, and more importantly their corresponding marginal likelihoods, to the same common denominator, which enables direct and accurate comparisons of competing species delimitation models using Bayes factors. We test this approach, which we call Bayes factor delimitation (*with genomic data; BFD*), using common species delimitation scenarios with computer simulations. Varying the numbers of loci and the number of samples suggest that the approach can distinguish the true model even with few loci and limited samples per species. Misspecification of the prior for population size θ has little impact on support for the true model. We apply the approach to West African forest geckos (Hemidactylus fasciatus complex) using genome-wide SNP data data. This new Bayesian method for species delimitation builds on a growing trend for objective species delimitation methods with explicit model assumptions that are easily tested.

Formal properties of the probability of fixation: identities, inequalities and approximations

Formal properties of the probability of fixation: identities, inequalities and approximations
David M. McCandlish, Charles L. Epstein, Joshua B. Plotkin
(Submitted on 5 Dec 2013)

The formula for the probability of fixation of a new mutation is widely used in theoretical population genetics and molecular evolution. Here we derive a series of identities, inequalities and approximations for the exact probability of fixation of a new mutation under the Moran process (equivalent results hold for the approximate probability of fixation for the Wright-Fisher process after an appropriate change of variables). We show that the behavior of the logarithm of the probability of fixation is particularly simple when the selection coefficient is measured as a difference of Malthusian fitnesses, and we exploit this simplicity to derive several inequalities and approximations. We also present a comprehensive comparison of both existing and new approximations for the probability of fixation, highlighting in particular approximations that result in a reversible Markov chain when used to model the dynamics of evolution under weak mutation.

Author post: Ploidy and the Predictability of Evolution in Fisher’s Geometric Model

This guest post is by Sandeep Venkataram and Dmitri A Petrov on their paper (with Diamantis Sellis) Venkataram et al. Ploidy and the Predictability of Evolution in Fisher’s Geometric Model

Since Gould’s famous thought-experiment (Gould, 1990) on “replaying the tape of life”, scientists have been interested in the predictability of evolution. Gould wondered whether it is possible to forecast evolution, and determine the path or the final destination of the evolutionary process from a given starting population. It is also possible, however, to ask whether we can retrocast evolution, and reconstruct the true evolutionary trajectory given the final state and possibly the ancestral state. Forward predictability analysis tries to predict the future evolutionary trajectory or future adapted state of an evolving population, while backwards predictability analysis tries to determine the likelihood of the possible alternative adaptive trajectories that lead to the observed adapted state.

Predictability has been empirically studied to a limited extent due to the laborious nature of such studies (e.g. Ferea et al 1999, Weinreich et al 2005 and Tenaillon et al 2012). We overcome these limitations by analyzing simulated adaptive walks under Fisher’s geometric model. To our knowledge, we are the first to study both of these types of predictability in a single system. We compare the predictabilities of haploid and diploid simulations, and find that forward and backward predictability are inversely correlated in this model. We attribute this inverse correlation to the presence of overdominant mutations and balanced polymorphisms in our diploid simulations and the lack of such mutations in the haploids (Sellis et al 2011).

We observe that the presence of balanced polymorphisms in diploids leads to a number of novel dynamics when studying predictability. It greatly increases the phenotypic diversity in diploid adaptive walks, leading to low forward predictability relative to haploids. We also detect mutations which are stably maintained but subsequently lost in diploid adaptive walks, and are thus hidden from sampling at the end of the simulation. We show that these hidden mutations, which also go unobserved in almost all empirical studies, strongly limit the inferences that can be made when analyzing backward predictability. Finally, we observe that when the same set of mutations is introduced into a diploid population in different orders, the final adapted allele is often balanced against different intermediate alleles, resulting in different adapted population states.

Our results show the importance of considering stable polymorphisms when analyzing adaptive trajectories, and detail, for the first time, some of the limitations in conducting such analysis using empirical data. In natural population, stable polymorphisms can be generated in both haploids and diploids by a wide range of mechanisms, including niche construction, frequency dependent selection, balancing selection and spatially and temporally fluctuating selection pressures. Therefore, our results should be relevant for all natural populations, regardless of ploidy.

Ferea, T., Botstein, D., Brown, P. O., & Rosenzweig, R. F. (1999). Systematic changes in gene expression patterns following. Proceedings of the National Academy of Sciences of the United States of America, 96(August), 9721–9726.

Gould, S. J. (1990). Wonderful Life: The Burgess Shale and the Nature of History (p. 352). W. W. Norton & Company.

Sellis, D., Callahan, B., Petrov, D. A., & Messer, P. W. (2011). Heterozygote advantage as a natural consequence of adaptation in diploids. Proceedings of the National Academy of Sciences of the United States of America, 2011, 1–6. doi:10.1073/pnas.1114573108

Tenaillon, O., Rodriguez-Verdugo, a., Gaut, R. L., McDonald, P., Bennett, a. F., Long, a. D., & Gaut, B. S. (2012). The Molecular Diversity of Adaptive Convergence. Science, 335(6067), 457–461. doi:10.1126/science.1212986

Weinreich, D. M., Delaney, N. F., Depristo, M. a, & Hartl, D. L. (2006). Darwinian evolution can follow only very few mutational paths to fitter proteins. Science, 312(5770), 111–4. doi:10.1126/science.1123539