The Fates of Mutant Lineages and the Distribution of Fitness Effects of Beneficial Mutations in Laboratory Budding Yeast Populations

The Fates of Mutant Lineages and the Distribution of Fitness Effects of Beneficial Mutations in Laboratory Budding Yeast Populations
Evgeni M. Frenkel, Benjamin H. Good, Michael M. Desai
(Submitted on 13 Feb 2014)

The outcomes of evolution are determined by which mutations occur and fix. In rapidly adapting microbial populations, this process is particularly hard to predict because lineages with different beneficial mutations often spread simultaneously and interfere with one another’s fixation. Hence to predict the fate of any individual variant, we must know the rate at which new mutations create competing lineages of higher fitness. Here, we directly measured the effect of this interference on the fates of specific adaptive variants in laboratory Saccharomyces cerevisiae populations and used these measurements to infer the distribution of fitness effects of new beneficial mutations. To do so, we seeded marked lineages with different fitness advantages into replicate populations and tracked their subsequent frequencies for hundreds of generations. Our results illustrate the transition between strongly advantageous lineages which decisively sweep to fixation and more moderately advantageous lineages that are often outcompeted by new mutations arising during the course of the experiment. We developed an approximate likelihood framework to compare our data to simulations and found that the effects of these competing beneficial mutations were best approximated by an exponential distribution, rather than one with a single effect size. We then used this inferred distribution of fitness effects to predict the rate of adaptation in a set of independent control populations. Finally, we discuss how our experimental design can serve as a screen for rare, large-effect beneficial mutations.

Cell specific eQTL analysis without sorting cells

Cell specific eQTL analysis without sorting cells

Harm-Jan Westra, Danny Arends, Tõnu Esko, Marjolein J. Peters, Claudia Schurmann, Katharina Schramm, Johannes Kettunen, Hanieh Yaghootkar, Benjamin Fairfax, Anand Kumar Andiappan, Yang Li, Jingyuan Fu, Juha Karjalainen, Mathieu Platteel, Marijn Visschedijk, Rinse Weersma, Silva Kasela, Lili Milani, Liina Tserel, Pärt Peterson, Eva Reinmaa, Albert Hofman, André G. Uitterlinden, Fernando Rivadeneira, Georg Homuth, Astrid Petersmann, Roberto Lorbeer, Holger Prokisch, Thomas Meitinger, Christian Herder, Michael Roden, Harald Grallert, Samuli Ripatti, Markus Perola, Adrew R. Wood, David Melzer, Luigi Ferrucci, Andrew B. Singleton, Dena G. Hernandez, Julian C. Knight, Rossella Melchiotti, Bernett Lee, Michael Poidinger, Francesca Zolezzi, Anis Larbi, De Yun Wang, Leonard H. van den Berg, Jan H. Veldink, Olaf Rotzschke, Seiko Makino, Timouthy Frayling, Veikko Salomaa, Konstantin Strauch, Uwe Völker, Joyce B.J. van Meurs, Andres Metspalu, Cisca Wijmenga, Ritsert C. Jansen, Lude Franke

Expression quantitative trait locus (eQTL) mapping on tissue, organ or whole organism data can detect associations that are generic across cell types. We describe a new method to focus upon specific cell types without first needing to sort cells. We applied the method to whole blood data from 5,683 samples and demonstrate that SNPs associated with Crohn’s disease preferentially affect gene expression within neutrophils.

Multiple Quantitative Trait Analysis Using Bayesian Networks

Multiple Quantitative Trait Analysis Using Bayesian Networks

Marco Scutari, Phil Howell, David J. Balding, Ian Mackay
(Submitted on 12 Feb 2014)

Models for genome-wide prediction and association studies usually target a single phenotypic trait. However, in animal and plant genetics it is common to record information on multiple phenotypes for each individual that will be genotyped. Modeling traits individually disregards the fact that they are most likely associated due to pleiotropy and shared biological basis, thus providing only a partial, confounded view of genetic effects and phenotypic interactions. In this paper we use data from a Multiparent Advanced Generation Inter-Cross (MAGIC) winter wheat population to explore Bayesian networks as a convenient and interpretable framework for the simultaneous modeling of multiple quantitative traits. We show that they are equivalent to multivariate genetic best linear unbiased prediction (GBLUP), and that they outperform single-trait elastic net and single-trait GBLUP in predictive performance. Finally, we discuss their relationship with other additive-effects models and their advantages in inference and interpretation. MAGIC populations provide an ideal setting for this kind of investigation because the very low population structure and large sample size result in predictive models with good power and limited confounding due to relatedness.

Can one hear the shape of a population history?

Can one hear the shape of a population history?
Junhyong Kim, Elchanan Mossel, Miklós Z. Rácz, Nathan Ross
(Submitted on 11 Feb 2014)

Reconstructing past population size from present day genetic data is a major goal of population genetics. Recent empirical studies infer population size history using coalescent-based models applied to a small number of individuals. While it is known that the allelic spectrum is not sufficient to infer the population size history, the distribution of coalescence times is. Here we provide tight bounds on the amount of information needed to recover the population size history at a certain level of accuracy assuming data given either by exact coalescence times, or given blocks of non-recombinant DNA sequences whose loci have approximately equal times to coalescence. Importantly, we prove lower bounds showing that it is impossible to accurately deduce population histories given limited data.

Estimating the evolution of human life history traits in age-structured populations

Estimating the evolution of human life history traits in age-structured populations
Ryan Baldini

I propose a method that estimates the selection response of all vital rates in an age-structured population. I assume that vital rates are determined by the additive genetic contributions of many loci. The method uses all relatedness information in the sample to inform its estimates of genetic parameters, via an MCMC Bayesian framework. One can use the results to estimate the selection response of any life history trait that is a function of the vital rates, including the age at first reproduction, total lifetime fertility, survival to adulthood, and others. This method closely ties the empirical analysis of life history evolution to dynamically complete models of natural selection, and therefore enjoys some theoretical advantages over other methods. I demonstrate the method on a simulated model of evolution with two age classes. Finally I discuss how the method can be extended to more complicated cases.

Population genetics on islands connected by an arbitrary network: An analytic approach

Population genetics on islands connected by an arbitrary network: An analytic approach
George W A Constable, Alan J McKane
(Submitted on 11 Feb 2014)

We analyse a model consisting of a population of individuals which is subdivided into a finite set of demes, each of which has a fixed but differing number of individuals. The individuals can reproduce, die and migrate between the demes according to an arbitrary migration network. They are haploid, with two alleles present in the population; frequency independent selection is also incorporated, where the strength and direction of selection can vary from deme to deme. The system is formulated as an individual-based model, and the diffusion approximation systematically applied to express it as a set of nonlinear coupled stochastic differential equations. These can be made amenable to analysis through the elimination of fast-time variables. The resulting reduced model is analysed in a number of situations, including migration-selection balance leading to a polymorphic equilibrium of the two alleles, and an illustration of how the subdivision of the population can lead to non-trivial behaviour in the case where the network is a simple hub. The method we develop is systematic, may be applied to any network, and agrees well with the results of simulations in all cases studied and across a wide range of parameter values.

Evidence for widespread positive and negative selection in coding and conserved noncoding regions of Capsella grandiflora

Evidence for widespread positive and negative selection in coding and conserved noncoding regions of Capsella grandiflora
Robert Williamson, Emily B Josephs, Adrian E Platts, Khaled M Hazzouri, Annabelle Haudry, Mathieu Blanchette, Stephen I Wright

The extent that both positive and negative selection vary across different portions of plant genomes remains poorly understood. Here we sequence whole genomes of 13 Capsella grandiflora individuals and quantify the amount of selection across the genome. Using an estimate of the distribution of fitness effects we show that selection is strong in coding regions, but weak in most noncoding regions with the exception of 5’ and 3’ untranslated regions (UTRs). However, estimates of selection in noncoding regions conserved across the Brassicaceae family show strong signals of selection. Additionally, we see reductions in neutral diversity around functional substitutions in both coding and conserved noncoding regions, indicating recent selective sweeps at these sites. Finally, using expression data from leaf tissue we show that genes that are more highly expressed experience stronger negative selection but comparable levels of positive selection to lowly expressed genes.

Discovering functional DNA elements using population genomic information: A proof of concept using human mtDNA

Discovering functional DNA elements using population genomic information: A proof of concept using human mtDNA
Daniel R. Schrider, Andrew D. Kern
Subjects: Populations and Evolution (q-bio.PE); Genomics (q-bio.GN)

Identifying the complete set of functional elements within the human genome would be a windfall for multiple areas of biological research including medicine, molecular biology, and evolution. Complete knowledge of function would aid in the prioritization of loci when searching for the genetic bases of disease or adaptive phenotypes. Because mutations that disrupt function are disfavored by natural selection, purifying selection leaves a detectable signature within functional elements; accordingly this signal has been exploited through the use of genomic comparisons of distantly related species. However, the functional complement of the genome changes extensively across time and between lineages, therefore, evidence of the current action of purifying selection is essential. Because the removal of deleterious mutations by natural selection also reduces within-species genetic diversity within functional loci, dense population genetic data have the potential to reveal genomic elements that are currently functional. Here we assess the potential of this approach using 16,411 human mitochondrial genomes. We show that the high density of polymorphism in this dataset precisely delineates regions experiencing purifying selection. Further, we show that the number of segregating alleles at a site is strongly correlated with its divergence across species after accounting for known mutational biases in human mtDNA. These two measures track one another at a remarkably fine scale across many loci–a correlation that is purely the result of natural selection. Our results demonstrate that genetic variation has the potential to reveal exactly which nucleotides in the genome are currently performing important functions and likely to have deleterious fitness effects when mutated. As more complete genomes are sequenced, similar power to reveal purifying selection may be achievable in the human nuclear genome.

The fixation time of a strongly beneficial allele in a structured population


The fixation time of a strongly beneficial allele in a structured population

Andreas Greven, Peter Pfaffelhuber, Cornelia Pokalyuk, Anton Wakolbinger
Comments: 41 pages, 4 figures
Subjects: Probability (math.PR); Populations and Evolution (q-bio.PE)

For a beneficial allele which enters a large unstructured population and eventually goes to fixation, it is known that the time to fixation is approximately $2\log(\alpha)/\alpha$ for a large selection coefficent $\alpha$. In the presence of spatial structure with migration between colonies we detect various regimes of the migration rate $\mu$ for which the fixation times have different asymptotics as $\alpha \to \infty$. If $\mu$ is of order $\alpha$, the allele fixes (as in the spatially unstructured case) in time $\sim 2\log(\alpha)/\alpha$. If $\mu$ is of order $\alpha^p, 0\leq p \leq 1$, the fixation time is $\sim (2 + (1-p)d) \log(\alpha)/\alpha$, where $d$ is the maximum of the migration steps that are required from the colony where the beneficial allele entered to any other colony. If $\mu = 1/\log(\alpha)$, the fixation time is $\sim (2+S)\log(\alpha)/\alpha$, where $S$ is a random time in a simple epidemic model. The main idea for our analysis is to combine a new moment dual for the process conditioned to fixation with the time reversal in equilibrium of a spatial version of Neuhauser and Krone’s ancestral selection graph.

Extensive epistasis within the MHC contributes to the genetic architecture of celiac disease

Extensive epistasis within the MHC contributes to the genetic architecture of celiac disease
Ben Goudey, Gad Abraham, Eder Kikianty, Qiao Wang, Dave Rawlinson, Fan Shi, Izhak Haviv, Linda Stern, Adam Kowalczyk, Michael Inouye

Epistasis has long been thought to contribute to the genetic aetiology of complex diseases, yet few robust epistatic interactions in humans have been detected. We have conducted exhaustive genome-wide scans for pairwise epistasis in five independent celiac disease (CeD) case-control studies, using a rapid model-free approach to examine over 500 billion SNP pairs in total. We found extensive epistasis within the MHC region with 7,270 statistically significant pairs achieving stringent replication criteria across multiple studies. These robust epistatic pairs partially tagged CeD risk HLA haplotypes, and replicable evidence for epistatic SNPs outside the MHC was not observed. Both within and between European populations, we observed striking consistency of epistatic models and epistatic model distribution, thus providing empirical estimates of their frequencies in a complex disease. Within the UK population, models of CeD comprised of both epistatic and additive single-SNP effects increased explained CeD variance by approximately 1% over those of single SNPs. Further analysis showed that additive SNP effects tag epistatic effects (and vice versa), sometimes involving SNPs separated by a megabase or more. These findings show that the genetic architecture of CeD consists of overlapping additive and epistatic components, indicating that the genetic architecture of CeD, and potentially other common autoimmune diseases, is more complex than previously thought.