Quantitative analyses of empirical fitness landscapes

Quantitative analyses of empirical fitness landscapes

Ivan G. Szendro, Martijn F. Schenk, Jasper Franke, Joachim Krug, J. Arjan G. M. de Visser
(Submitted on 20 Feb 2012 (v1), last revised 17 Oct 2012 (this version, v2))

The concept of a fitness landscape is a powerful metaphor that offers insight into various aspects of evolutionary processes and guidance for the study of evolution. Until recently, empirical evidence on the ruggedness of these landscapes was lacking, but since it became feasible to construct all possible genotypes containing combinations of a limited set of mutations, the number of studies has grown to a point where a classification of landscapes becomes possible. The aim of this review is to identify measures of epistasis that allow a meaningful comparison of fitness landscapes and then apply them to the empirical landscapes to discern factors that affect ruggedness. The various measures of epistasis that have been proposed in the literature appear to be equivalent. Our comparison shows that the ruggedness of the empirical landscape is affected by whether the included mutations are beneficial or deleterious and by whether intra- or intergenic epistasis is involved. Finally, the empirical landscapes are compared to landscapes generated with the Rough Mt. Fuji model. Despite the simplicity of this model, it captures the features of the experimental landscapes remarkably well.

The equivalence between weak and strong purifying selection

The equivalence between weak and strong purifying selection
Benjamin H Good, Michael M Desai
(Submitted on 16 Oct 2012)

Weak purifying selection, acting on many linked mutations, may play a major role in shaping patterns of molecular evolution in natural populations. Yet efforts to infer these effects from DNA sequence data are limited by our incomplete understanding of weak selection on local genomic scales. Here, we demonstrate a natural symmetry between weak and strong selection, in which the effects of many weakly selected mutations on patterns of molecular evolution are equivalent to a smaller number of more strongly selected mutations. By introducing a coarse-grained “effective selection coefficient,” we derive an explicit mapping between weakly selected populations and their strongly selected counterparts, which allows us to make accurate and efficient predictions across the full range of selection strengths. This suggests that an effective selection coefficient and effective mutation rate — not an effective population size — is the most accurate summary of the effects of selection over locally linked regions. Moreover, this correspondence places fundamental limits on our ability to resolve the effects of weak selection from contemporary sequence data alone.

Fluctuations in fitness distributions and the effects of weak linked selection on sequence evolution

Fluctuations in fitness distributions and the effects of weak linked selection on sequence evolution

Benjamin H. Good, Michael M. Desai
(Submitted on 15 Oct 2012)

Evolutionary dynamics and patterns of molecular evolution are strongly influenced by selection on linked regions of the genome, but our quantitative understanding of these effects remains incomplete. Recent work has focused on predicting the distribution of fitness within an evolving population, and this forms the basis for several methods that leverage the fitness distribution to predict the patterns of genetic diversity when selection is strong. However, in weakly selected populations random fluctuations due to genetic drift are more severe, and neither the distribution of fitness nor the sequence diversity within the population are well understood. Here, we briefly review the motivations behind the fitness-distribution picture, and summarize the general approaches that have been used to analyze this distribution in the strong-selection regime. We then extend these approaches to the case of weak selection, by outlining a perturbative treatment of selection at a large number of linked sites. This allows us to quantify the stochastic behavior of the fitness distribution and yields exact analytical predictions for the sequence diversity and substitution rate in the limit that selection is weak.

Birth and death processes with neutral mutations

Birth and death processes with neutral mutations
Nicolas Champagnat, Amaury Lambert, Mathieu Richard
(Submitted on 27 Sep 2012)

In this paper, we review recent results of ours concerning branching processes with general lifetimes and neutral mutations, under the infinitely many alleles model, where mutations can occur either at birth of individuals or at a constant rate during their lives.
In both models, we study the allelic partition of the population at time t. We give closed formulae for the expected frequency spectrum at t and prove pathwise convergence to an explicit limit, as t goes to infinity, of the relative numbers of types younger than some given age and carried by a given number of individuals (small families). We also provide convergences in distribution of the sizes or ages of the largest families and of the oldest families.
In the case of exponential lifetimes, population dynamics are given by linear birth and death processes, and we can most of the time provide general formulations of our results unifying both models.

On The External Branches Of Coalescent Processes With Multiple Collisions With An Emphasis On The Bolthausen-Sznitman Coalescent

On The External Branches Of Coalescent Processes With Multiple Collisions With An Emphasis On The Bolthausen-Sznitman Coalescent
Jean-Stephane Dhersin (IG, LAGA), Martin Moehle
(Submitted on 15 Sep 2012)

A recursion for the joint moments of the external branch lengths for coalescents with multiple collisions (\Lambda-coalescents) is provided. This recursion is used to derive asymptotic expansions as the sample size n tends to infinity for the moments of the total external branch length of the Bolthausen–Sznitman coalescent. The proof is based on an elementary difference method. An alternative differential equation method is developed which can be used to obtain exact solutions for the joint moments of the external branch lengths for the Bolthausen–Sznitman coalescent. The results for example show that the lengths of two randomly chosen external branches are positively correlated for the Bolthausen–Sznitman coalescent, whereas they are negatively correlated for the Kingman coalescent provided that n\ge 4.

Analysis of DNA sequence variation within marine species using Beta-coalescents

Analysis of DNA sequence variation within marine species using Beta-coalescents

Matthias Steinrücken, Matthias Birkner, Jochen Blath
(Submitted on 4 Sep 2012)

We apply recently developed inference methods based on general coalescent processes to DNA sequence data obtained from various marine species. Several of these species are believed to exhibit so-called shallow gene genealogies, potentially due to extreme reproductive behaviour, e.g. via Hedgecock’s “reproduction sweepstakes”. Besides the data analysis, in particular the inference of mutation rates and the estimation of the (real) time to the most recent common ancestor, we briefly address the question whether the genealogies might be adequately described by so-called Beta coalescents (as opposed to Kingman’s coalescent), allowing multiple mergers of genealogies.
The choice of the underlying coalescent model for the genealogy has drastic implications for the estimation of the above quantities, in particular the real-time embedding of the genealogy

Finite populations with frequency-dependent selection: a genealogical approach

Finite populations with frequency-dependent selection: a genealogical approach

Peter Pfaffelhuber, Benedikt Vogt
(Submitted on 28 Jul 2012)

Evolutionary models for populations of constant size are frequently studied using the Moran model, the Wright-Fisher model, or their diffusion limits. When evolution is neutral, a random genealogy given through Kingman’s coalescent is used in order to understand basic properties of such models. Here, we address the use of a genealogical perspective for models with weak frequency-dependent selection, i.e. N s =: {\alpha} is small, and s is the fitness advantage of a fit individual and N is the population size. When computing fixation probabilities, this leads either to the approach proposed by Rousset (2003), who argues how to use the Kingman’s coalescent for weak selection, or to extensions of the ancestral selection graph of Neuhauser and Krone (1997) and Neuhauser (1999). As an application, we re-derive the one-third law of evolutionary game theory (Nowak et al., 2004). In addition, we provide the approximate distribution of the genealogical distance of two randomly sampled individuals under linear frequency-dependence.

Our paper: Genealogies of rapidly adapting populations

[This author post is by Richard Neher on his paper with Oskar Hallatschek: Genealogies of rapidly adapting populations arXived here.


That selection distorts genealogies is a well-known fact, but properties of genealogies shaped by selection are poorly understood. We set out to investigate genealogies in a simple model of rapid adaptation in asexuals: The fitness of individuals is changed by small amounts though frequent mutation, while the overall population size is kept constant by a carrying capacity. We simulated the model and tracked genealogies.

The genealogies we found have two striking features incompatible with the standard neutral coalescent: (i) Many lineages merge almost simultaneously. (ii) Forward in time, the trees often branch very asymmetrically, i.e., almost the entire population descends from one branch while the other branches share the remaining minority. Using branching process approximations and a mapping to range expansion problems (see Brunet et al, (2007)), we show that the genealogies are similar to those expected from the Bolthausen-Sznitman coalescent (BSC), a special case of multiple merger coalescents. Very similar conclusions have been reached in another recent preprint by Desai, Walczak and Fisher. The BSC is well studied and we can build on many results from the mathematical literature.

The difference between Kingman and multiple merger coalescence is closely related to the distinct stochastic properties of genetic drift and draft. While drift describes short term fluctuations in offspring number which are bounded, draft refers to stochasticity through linked selection. Draft can result in fluctuations of the same order as the population size. Even if very rare, such large fluctuations are important. Lumping drift and draft together and labeling the result as effective population size is rarely helpful and often confusing.

Why should we care? We often want to learn about past dynamics from snapshots of populations (sequence samples). To this end, we compare the diversity patterns in the sample to model predictions and infer model parameters. If we use an inappropriate model, we get meaningless answers. Furthermore, some events that are very unlikely under Kingman’s coalescent are quite common when multiple mergers are allowed. Consider for example a lone haplotype in a large sample that connects to the root of the tree. This is very unlikely in neutral coalescent models and one might take it as evidence for immigration from a diverged population. If multiple mergers dominate coalescence, this does not come as a surprise. Similarly, an excess of singletons is not necessarily evidence for expanding populations or deleterious mutations but might be due to draft. I wonder whether more potential pitfalls of this sort exist.

Richard Neher

Genealogies of rapidly adapting populations

Genealogies of rapidly adapting populations
Richard A. Neher, Oskar Hallatschek
(Submitted on 15 Aug 2012)

The genetic diversity of a species is shaped by its recent evolutionary history and can be used to infer demographic events or selective sweeps. Most inference methods are based on the null hypothesis that natural selection is a weak evolutionary force. However, many species, particularly pathogens, are under continuous pressure to adapt in response to changing environments. A statistical framework for inference from diversity data of such populations is currently lacking. Toward this goal, we explore the properties of genealogies that emerge from models of continual adaptation. We show that lineages trace back to a small pool of highly fit ancestors, in which simultaneous coalescence of more than two lineages frequently occurs. While such multiple mergers are unlikely under the neutral coalescent, they create a unique genetic footprint in adapting populations. The site frequency spectrum of derived neutral alleles, for example, is non-monotonic and has a peak at high frequencies, whereas Tajima’s D becomes more and more negative with increasing sample size. Since multiple merger coalescents emerge in various evolutionary scenarios characterized by sustained selection pressures, we argue that they should be considered as null-models for adapting populations.

Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations.

Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations.

Michael M. Desai, Aleksandra M. Walczak, Daniel S. Fisher
(Submitted on 16 Aug 2012)
Positive selection distorts the structure of genealogies and hence alters patterns of genetic variation within a population. Most analyses of these distortions focus on the signatures of hitchhiking due to hard or soft selective sweeps at a single genetic locus. However, in linked regions of rapidly adapting genomes, multiple beneficial mutations at different loci can segregate simultaneously within the population, an effect known as clonal interference. This leads to a subtle interplay between hitchhiking and interference effects, which leads to a unique signature of rapid adaptation on genetic variation both at the selected sites and at linked neutral loci. Here, we introduce an effective coalescent theory (a “fitness-class coalescent”) that describes how positive selection at many perfectly linked sites alters the structure of genealogies. We use this theory to calculate several simple statistics describing genetic variation within a rapidly adapting population, and to implement efficient backwards-time coalescent simulations which can be used to predict how clonal interference alters the expected patterns of molecular evolution.