WFABC: a Wright-Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data
Matthieu Foll, Hyunjin Shim, Jeffrey D. Jensen
With novel developments in sequencing technologies, time-sampled data are becoming more available and accessible. Naturally, there have been efforts in parallel to infer population genetic parameters from these datasets. Here, we compare and analyze four recent approaches based on the Wright-Fisher model for inferring selection coefficients (s) given effective population size (Ne), with simulated temporal datasets. Furthermore, we demonstrate the advantage of a recently proposed ABC-based method that is able to correctly infer genome-wide average Ne from time-serial data, which is then set as a prior for inferring per-site selection coefficients accurately and precisely. We implement this ABC method in a new software and apply it to a classical time-serial dataset of the medionigra genotype in the moth Panaxia dominula. We show that a recessive lethal model is the best explanation for the observed variation in allele frequency by implementing an estimator of the dominance ratio (h).
Thinking too positive? Revisiting current methods of population-genetic selection inference
Claudia Bank, Gregory B Ewing, Anna Ferrer-Admettla, Matthieu Foll, Jeffrey D Jensen
In the age of next-generation sequencing, the availability of increasing amounts and quality of data at decreasing cost ought to allow for a better understanding of how natural selection is shaping the genome than ever before. Yet, alternative forces such as demography and background selection obscure the footprints of positive selection that we would like to identify. Here, we illustrate recent developments in this area, and outline a roadmap for improved selection inference. We argue (1) that the development and obligatory use of advanced simulation tools is necessary for improved identification of selected loci, (2) that genomic information from multiple- time points will enhance the power of inference, and (3) that results from experimental evolution should be utilized to better inform population-genomic studies.
The projection of a test genome onto a reference population and applications to humans and archaic hominins
Melinda A Yang, Montgomery Slatkin
We introduce a method for comparing a test genome with numerous genomes from a reference population. Sites in the test genome are given a weight w that depends on the allele frequency x in the reference population. The projection of the test genome onto the reference population is the average weight for each x, w(x). The weight is assigned in such a way that if the test genome is a random sample from the reference population, w(x)=1. Using analytic theory, numerical analysis, and simulations, we show how the projection depends on the time of population splitting, the history of admixture and changes in past population size. The projection is sensitive to small amounts of past admixture, the direction of admixture and admixture from a population not sampled (a ghost population). We compute the projection of several human and two archaic genomes onto three reference populations from the 1000 Genomes project, Europeans (CEU), Han Chinese (CHB) and Yoruba (YRI) and discuss the consistency of our analysis with previously published results for European and Yoruba demographic history. Including higher amounts of admixture between Europeans and Yoruba soon after their separation and low amounts of admixture more recently can resolve discrepancies between the projections and demographic inferences from some previous studies.
Sampling through time and phylodynamic inference with coalescent and birth-death models
Erik M. Volz, Simon DW Frost
(Submitted on 28 Aug 2014)
Many population genetic models have been developed for the purpose of inferring population size and growth rates from random samples of genetic data. We examine two popular approaches to this problem, the coalescent and the birth-death-sampling model, in the context of estimating population size and birth rates in a population growing exponentially according to the birth-death branching process. For sequences sampled at a single time, we found the coalescent and the birth-death-sampling model gave virtually indistinguishable results in terms of the growth rates and fraction of the population sampled, even when sampling from a small population. For sequences sampled at multiple time points, we find that the birth-death model estimators are subject to large bias if the sampling process is misspecified. Since birth-death-sampling models incorporate a model of the sampling process, we show how much of the statistical power of birth-death-sampling models arises from the sequence of sample times and not from the genealogical tree. This motivates the development of a new coalescent estimator, which is augmented with a model of the known sampling process and is potentially more precise than the coalescent that does not use sample time information.
A genomic map of the effects of linked selection in Drosophila
Eyal Elyashiv, Shmuel Sattath, Tina T. Hu, Alon Strustovsky, Graham McVicker, Peter Andolfatto, Graham Coop, Guy Sella
(Submitted on 23 Aug 2014)
Natural selection at one site shapes patterns of genetic variation at linked sites. Quantifying the effects of ‘linked selection’ on levels of genetic diversity is key to making reliable inference about demography, building a null model in scans for targets of adaptation, and learning about the dynamics of natural selection. Here, we introduce the first method that jointly infers parameters of distinct modes of linked selection, notably background selection and selective sweeps, from genome-wide diversity data, functional annotations and genetic maps. The central idea is to calculate the probability that a neutral site is polymorphic given local annotations, substitution patterns, and recombination rates. Information is then combined across sites and samples using composite likelihood in order to estimate genome-wide parameters of distinct modes of selection. In addition to parameter estimation, this approach yields a map of the expected neutral diversity levels along the genome. To illustrate the utility of our approach, we apply it to genome-wide resequencing data from 125 lines in Drosophila melanogaster and reliably predict diversity levels at the 1Mb scale. Our results corroborate estimates of a high fraction of beneficial substitutions in proteins and untranslated regions (UTR). They allow us to distinguish between the contribution of sweeps and other modes of selection around amino acid substitutions and to uncover evidence for pervasive sweeps in untranslated regions (UTRs). Our inference further suggests a substantial effect of linked selection from non-classic sweeps. More generally, we demonstrate that linked selection has had a larger effect in reducing diversity levels and increasing their variance in D. melanogaster than previously appreciated.
Robust Population Structure Inference and Correction in the Presence of Known or Cryptic Relatedness
Matthew P Conomos, Michael B Miller, Timothy A Thornton
Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multi-dimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using ten (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness.
The impact of macroscopic epistasis on long-term evolutionary dynamics
Benjamin H. Good, Michael M. Desai
(Submitted on 18 Aug 2014)
Genetic interactions can strongly influence the fitness effects of individual mutations, yet the impact of these epistatic interactions on evolutionary dynamics remains poorly understood. Here we investigate the evolutionary role of epistasis over 50,000 generations in a well-studied laboratory evolution experiment in E. coli. The extensive duration of this experiment provides a unique window into the effects of epistasis during long-term adaptation to a constant environment. Guided by analytical results in the weak-mutation limit, we develop a computational framework to assess the compatibility of a given epistatic model with the observed patterns of fitness gain and mutation accumulation through time. We find that the average fitness trajectory alone provides little power to distinguish between competing models, including those that lack any direct epistatic interactions between mutations. However, when combined with the mutation trajectory, these observables place strong constraints on the set of possible models of epistasis, ruling out most existing explanations of the data. Instead, we find the strongest support for a “two-epoch” model of adaptation, in which an initial burst of diminishing returns epistasis is followed by a steady accumulation of mutations under a constant distribution of fitness effects. Our results highlight the need for additional DNA sequencing of these populations, as well as for more sophisticated models of epistasis that are compatible with all of the experimental data.