# Adaptive evolution of molecular phenotypes

Adaptive evolution of molecular phenotypes

Torsten Held, Armita Nourmohammad, Michael Lässig
(Submitted on 7 Mar 2014)

Molecular phenotypes link genomic information with organismic functions, fitness, and evolution. Quantitative traits are complex phenotypes that depend on multiple genomic loci. In this paper, we study the adaptive evolution of a quantitative trait under time-dependent selection, which arises from environmental changes or through fitness interactions with other co-evolving phenotypes. We analyze a model of trait evolution under mutations and genetic drift in a single-peak fitness seascape. The fitness peak performs a constrained random walk in the trait amplitude, which determines the time-dependent trait optimum in a given population. We derive analytical expressions for the distribution of the time-dependent trait divergence between populations and of the trait diversity within populations. Based on this solution, we develop a method to infer adaptive evolution of quantitative traits. Specifically, we show that the ratio of the average trait divergence and the diversity is a universal function of evolutionary time, which predicts the stabilizing strength and the driving rate of the fitness seascape. From an information-theoretic point of view, this function measures the macro-evolutionary entropy in a population ensemble, which determines the predictability of the evolutionary process. Our solution also quantifies two key characteristics of adapting populations: the cumulative fitness flux, which measures the total amount of adaptation, and the adaptive load, which is the fitness cost due to a population’s lag behind the fitness peak.

# The limits of selection under plant domestication

The limits of selection under plant domestication
Robin G. Allaby, Dorian Q. Fuller, James L. Kitchen
Subjects: Populations and Evolution (q-bio.PE)

Plant domestication involved a process of selection through human agency of a series of traits collectively termed the domestication syndrome. Current debate concerns the pace at which domesticated plants emerged from cultivated wild populations and how many genes were involved. Here we present simulations that test how many genes could have been involved by considering the cost of selection. We demonstrate the selection load that can be endured by populations increases with decreasing selection coefficients and greater numbers of loci down to values of about s = 0.005, causing a driving force that increases the number of loci under selection. As the number of loci under selection increases, an effect of co-selection increases resulting in individual unlinked loci being fixed more rapidly in out-crossing populations, representing a second driving force to increase the number of loci under selection. In inbreeding systems co-selection results in interference and reduced rates of fixation but does not reduce the size of the selection load that can be endured. These driving forces result in an optimum pace of genome evolution in which 50-100 loci are the most that could be under selection in a cultivation regime. Furthermore, the simulations do not preclude the existence of selective sweeps but demonstrate that they come at a cost of the selection load that can be endured and consequently a reduction of the capacity of plants to adapt to new environments, which may contribute to the explanation of why selective sweeps have been so rarely detected in genome studies.

# Can one hear the shape of a population history?

Can one hear the shape of a population history?
Junhyong Kim, Elchanan Mossel, Miklós Z. Rácz, Nathan Ross
(Submitted on 11 Feb 2014)

Reconstructing past population size from present day genetic data is a major goal of population genetics. Recent empirical studies infer population size history using coalescent-based models applied to a small number of individuals. While it is known that the allelic spectrum is not sufficient to infer the population size history, the distribution of coalescence times is. Here we provide tight bounds on the amount of information needed to recover the population size history at a certain level of accuracy assuming data given either by exact coalescence times, or given blocks of non-recombinant DNA sequences whose loci have approximately equal times to coalescence. Importantly, we prove lower bounds showing that it is impossible to accurately deduce population histories given limited data.

# The fixation time of a strongly beneficial allele in a structured population

The fixation time of a strongly beneficial allele in a structured population

Andreas Greven, Peter Pfaffelhuber, Cornelia Pokalyuk, Anton Wakolbinger
Comments: 41 pages, 4 figures
Subjects: Probability (math.PR); Populations and Evolution (q-bio.PE)

For a beneficial allele which enters a large unstructured population and eventually goes to fixation, it is known that the time to fixation is approximately $2\log(\alpha)/\alpha$ for a large selection coefficent $\alpha$. In the presence of spatial structure with migration between colonies we detect various regimes of the migration rate $\mu$ for which the fixation times have different asymptotics as $\alpha \to \infty$. If $\mu$ is of order $\alpha$, the allele fixes (as in the spatially unstructured case) in time $\sim 2\log(\alpha)/\alpha$. If $\mu$ is of order $\alpha^p, 0\leq p \leq 1$, the fixation time is $\sim (2 + (1-p)d) \log(\alpha)/\alpha$, where $d$ is the maximum of the migration steps that are required from the colony where the beneficial allele entered to any other colony. If $\mu = 1/\log(\alpha)$, the fixation time is $\sim (2+S)\log(\alpha)/\alpha$, where $S$ is a random time in a simple epidemic model. The main idea for our analysis is to combine a new moment dual for the process conditioned to fixation with the time reversal in equilibrium of a spatial version of Neuhauser and Krone’s ancestral selection graph.

# The roles of standing genetic variation and evolutionary history in determining the evolvability of anti-predator strategies

The roles of standing genetic variation and evolutionary history in determining the evolvability of anti-predator strategies
Jordan Fish, Daniel R O’Donnell, Abhijna Parigi, Ian Dworkin, Aaron P Wagner
Standing genetic variation and the historical environment in which that variation arises (evolutionary history) are both potentially significant determinants of a populations capacity for evolutionary response to a changing environment. We evaluated the relative importance of these two factors in influencing the evolutionary trajectories in the face of sudden environmental change. We used the open-ended digital evolution software Avida to examine how historic exposure to predation pressures, different levels of genetic variation, and combinations of the two, impact anti-predator strategies and competitive abilities evolved in the face of threats from new, invasive, predator populations. We show that while standing genetic variation plays some role in determining evolutionary responses, evolutionary history has the greater influence on a populations capacity to evolve effective anti-predator traits. This adaptability likely reflects the relative ease of repurposing existing, relevant genes and traits, and the broader potential value of the generation and maintenance of adaptively flexible traits in evolving populations.

# The arrival of the frequent: how bias in genotype-phenotype maps can steer populations to local optima

The arrival of the frequent: how bias in genotype-phenotype maps can steer populations to local optima

Ard A Louis, Steffen Schaper
(Submitted on 6 Feb 2014)

Genotype-phenotype (GP) maps specify how the random mutations that change genotypes generate variation by altering phenotypes, which, in turn, can trigger selection. Many GP maps share the following general properties: 1) The number of genotypes NG is much larger than the number of selectable phenotypes; 2) Neutral exploration changes the variation that is accessible to the population; 3) The distribution of phenotype frequencies Fp=Np/NG, with Np the number of genotypes mapping onto phenotype p, is highly biased: the majority of genotypes map to only a small minority of the phenotypes. Here we explore how these properties affect the evolutionary dynamics of haploid Wright-Fisher models that are coupled to a simplified and general random GP map or to a more complex RNA sequence to secondary structure map. For both maps the probability of a mutation leading to a phenotype p scales to first order as Fp, although for the RNA map there are further correlations as well. By using mean-field theory, supported by computer simulations, we show that the discovery time Tp of a phenotype p similarly scales to first order as 1/Fp for a wide range of population sizes and mutation rates in both the monomorphic and polymorphic regimes. These differences in the rate at which variation arises can vary over many orders of magnitude. Phenotypic variation with a larger Fp is therefore be much more likely to arise than variation with a small Fp. We show, using the RNA model, that frequent phenotypes (with larger Fp) can fix in a population even when alternative, but less frequent, phenotypes with much higher fitness are potentially accessible. In other words, if the fittest never arrive’ on the timescales of evolutionary change, then they can’t fix. We call this highly non-ergodic effect the arrival of the frequent’.

# Footprints of ancient balanced polymorphisms in genetic variation data

Footprints of ancient balanced polymorphisms in genetic variation data
Ziyue Gao, Molly Przeworski, Guy Sella
(Submitted on 29 Jan 2014)

When long-lived, balancing selection can lead to trans-species polymorphisms that are shared by two or more species identical by descent. In this case, the gene genealogies at the selected sites cluster by allele instead of by species and, because of linkage, nearby neutral sites also have unusual genealogies. Although it is clear that this scenario should lead to discernible footprints in genetic variation data, notably the presence of additional neutral polymorphisms shared between species and the absence of fixed differences, the effects remain poorly characterized. We focus on the case of a single site under long-lived balancing selection and derive approximations for summaries of the data that are sensitive to a trans-species polymorphism: the length of the segment that carries most of the signals, the expected number of shared neutral SNPs within the segment and the patterns of allelic associations among them. Coalescent simulations of ancient balancing selection confirm the accuracy of our approximations. We further show that for humans and chimpanzees, and more generally for pairs of species with low genetic diversity levels, the patterns of genetic variation on which we focus are highly unlikely to be generated by neutral recurrent mutations, so these statistics are specific as well as sensitive. We discuss the implications of our results for the design and interpretation of genome scans for ancient balancing selection in apes and other taxa.

# The evolution of moment generating functions for the Wright Fisher model of population genetics

The evolution of moment generating functions for the Wright Fisher model of population genetics
Tat Dat Tran, Julian Hofrichter, Juergen Jost
(Submitted on 21 Jan 2014)

We derive and apply a partial differential equation for the moment generating function of the Wright-Fisher model of population genetics.

# Coalescence 2.0: a multiple branching of recent theoretical developments and their applications

Coalescence 2.0: a multiple branching of recent theoretical developments and their applications
Aurelien Tellier, Christophe Lemaire
(Submitted on 21 Jan 2014)

Population genetics theory has laid the foundations for genomics analyses including the recent burst in genome scans for selection and statistical inference of past demographic events in many prokaryote, animal and plant species. Identifying SNPs under natural selection and underpinning species adaptation relies on disentangling the respective contribution of random processes (mutation, drift, migration) from that of selection on nucleotide variability. Most theory and statistical tests have been developed using the Kingman coalescent theory based on the Wright-Fisher population model. However, these theoretical models rely on biological and life-history assumptions which may be violated in many prokaryote, fungal, animal or plant species. Recent theoretical developments of the so called multiple merger coalescent models are reviewed here ({\Lambda}-coalescent, beta-coalescent, Bolthausen-Snitzman, {\Xi}-coalescent). We explicit how these new models take into account various pervasive ecological and biological characteristics, life history traits or life cycles which were not accounted in previous theories such as 1) the skew in offspring production typical of marine species, 2) fast adapting microparasites (virus, bacteria and fungi) exhibiting large variation in population sizes during epidemics, 3) the peculiar life cycles of fungi and bacteria alternating sexual and asexual cycles, and 4) the high rates of extinction-recolonization in spatially structured populations. We finally discuss the relevance of multiple merger models for the detection of SNPs under selection in these species, for population genomics of very large sample size and advocate to potentially examine the conclusion of previous population genetics studies.

# The existence and abundance of ghost ancestors in biparental populations

The existence and abundance of ghost ancestors in biparental populations

Simon Gravel, Mike Steel
(Submitted on 15 Jan 2014)

In a randomly-mating biparental population of size N there are, with high probability, individuals who are genealogical ancestors of every extant individual within approximately log2(N) generations into the past. We use this result of Chang to prove a curious corollary under standard models of recombination: there exist, with high probability, individuals within a constant multiple of log2(N) generations into the past who are simultaneously (i) genealogical ancestors of {\em each} of the individuals at the present, and (ii) genetic ancestors to {\em none} of the individuals at the present. Such ancestral individuals – ancestors of everyone today that left no genetic trace — represent `ghost’ ancestors in a strong sense. In this short note, we use simple analytical argument and simulations to estimate how many such individuals exist in Wright-Fisher populations.