# Gaussian process test for high-throughput sequencing time series: application to experimental evolution

Gaussian process test for high-throughput sequencing time series: application to experimental evolution
Hande Topa, Ágnes Jónás, Robert Kofler, Carolin Kosiol, Antti Honkela
Subjects: Populations and Evolution (q-bio.PE); Genomics (q-bio.GN); Quantitative Methods (q-bio.QM); Applications (stat.AP)

Motivation: Recent advances in high-throughput sequencing (HTS) have made it possible to monitor genomes in great detail. New experiments not only use HTS to measure genomic features at one time point but to monitor them changing over time with the aim of identifying significant changes in their abundance. In population genetics, for example, allele frequencies are monitored over time to detect significant frequency changes that indicate selection pressures. Previous attempts at analysing data from HTS experiments have been limited as they could not simultaneously include data at intermediate time points, replicate experiments and sources of uncertainty specific to HTS such as sequencing depth.
Results: We present the beta-binomial Gaussian process (BBGP) model for ranking features with significant non-random variation in abundance over time. The features are assumed to represent proportions, such as proportion of an alternative allele in a population. We use the beta-binomial model to capture the uncertainty arising from finite sequencing depth and combine with a Gaussian process model over the time series. In simulations that mimic the features of experimental evolution data, the proposed method clearly outperforms classical testing in average precision of finding selected alleles. We also present results on real data from Drosophila experimental evolution experiment in temperature adaptation.
Availability: R software implementing the test is available at https://github.com/handetopa/BBGP.

# An experimentally determined evolutionary model dramatically improves phylogenetic fit

All modern approaches to molecular phylogenetics require a quantitative model for how genes evolve. Unfortunately, existing evolutionary models do not realistically represent the site-heterogeneous selection that governs actual sequence change. Attempts to remedy this problem have involved augmenting these models with a burgeoning number of free parameters. Here I demonstrate an alternative: experimental determination of a parameter-free evolutionary model via mutagenesis, functional selection, and deep sequencing. Using this strategy, I create an evolutionary model for influenza nucleoprotein that describes the gene phylogeny far better than existing models with dozens or even hundreds of free parameters. High-throughput experimental strategies such as the one employed here provide fundamentally new information that has the potential to transform the sensitivity of phylogenetic analyses.

# Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity

Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity
Sergey Kryazhimskiy, Daniel Paul Rice, Elizabeth Jerison, Michael M Desai

Epistasis can make adaptation highly unpredictable, rendering evolutionary trajectories contingent on the chance effects of initial mutations. We used experimental evolution in Saccharomyces cerevisiae to quantify this effect, finding dramatic differences in adaptability between 64 closely related genotypes. Despite these differences, sequencing of 105 evolved clones showed no significant effect of initial genotype on future sequence-level evolution. Instead, reconstruction experiments revealed a consistent pattern of diminishing returns epistasis. Our results suggest that many beneficial mutations affecting a variety of biological processes are globally coupled: they interact strongly, but only through their combined effect on fitness. Sequence-level adaptation is thus highly stochastic. Nevertheless, fitness evolution is strikingly predictable because differences in adaptability are determined only by global fitness-mediated epistasis, not by the identity of individual mutations.

# Biophysical Fitness Landscapes for Transcription Factor Binding Sites

Biophysical Fitness Landscapes for Transcription Factor Binding Sites
Allan Haldane, Michael Manhart, Alexandre V. Morozov
(Submitted on 3 Dec 2013)

Evolutionary trajectories and phenotypic states available to cell populations are ultimately dictated by intermolecular interactions between DNA, RNA, proteins, and other molecular species. Here we study how evolution of gene regulation in a single-cell eukaryote S. cerevisiae is affected by the interactions between transcription factors (TFs) and their cognate genomic sites. Our study is informed by high-throughput in vitro measurements of TF-DNA binding interactions and by a comprehensive collection of genomic binding sites. Using an evolutionary model for monomorphic populations evolving on a fitness landscape, we infer fitness as a function of TF-DNA binding energy for a collection of 12 yeast TFs, and show that the shape of the predicted fitness functions is in broad agreement with a simple thermodynamic model of two-state TF-DNA binding. However, the effective temperature of the model is not always equal to the physical temperature, indicating selection pressures in addition to biophysical constraints caused by TF-DNA interactions. We find little statistical support for the fitness landscape in which each position in the binding site evolves independently, showing that epistasis is common in evolution of gene regulation. Finally, by correlating TF-DNA binding energies with biological properties of the sites or the genes they regulate, we are able to rule out several scenarios of site-specific selection, under which binding sites of the same TF would experience a spectrum of selection pressures depending on their position in the genome. These findings argue for the existence of universal fitness landscapes which shape evolution of all sites for a given TF, and whose properties are determined in part by the physics of protein-DNA interactions.

# Genome-wide targets of selection: female response to experimental removal of sexual selection in Drosophila melanogaster

Genome-wide targets of selection: female response to experimental removal of sexual selection in Drosophila melanogaster
Paolo Innocenti, Ilona Flis, Edward H Morrow

Despite the common assumption that promiscuity should in general be favored in males, but not in females, to date there is no consensus on the general impact of multiple mating on female fitness. Notably, very little is known about the genetic and physiological features underlying the female response to sexual selection pressures. By combining an experimental evolution approach with genomic techniques, we investigated the effects of single and multiple matings on female fecundity and gene expression. We experimentally manipulated the mating system in replicate populations of Drosophila melanogaster by removing sexual selection, with the aim of testing differences in short term post-mating effects of females evolved under different mating strategies. We show that monogamous females suffer decreased fecundity, a decrease that was partially recovered by experimentally reversing the selection pressure back to the ancestral promiscuous state. The post-mating gene expression profiles of monogamous females differ significantly from promiscuous females, involving 9% of the genes tested. These transcripts are active in several tissues, mainly ovaries, neural tissues and midgut, and are involved in metabolic processes, reproduction and signaling pathways. Our results demonstrate how the female post-mating response can evolve under different mating systems, and provide novel insights into the genes targeted by sexual selection in females, by identifying a list of candidate genes responsible for the decrease in female fecundity in the absence of promiscuity.

# The first steps of adaptation of Escherichia coli to the gut are dominated by soft sweeps

The first steps of adaptation of Escherichia coli to the gut are dominated by soft sweeps
João Barroso-Batista, Ana Sousa, Marta Lourenço, Marie-Louise Bergman, Jocelyne Demengeot, Karina B. Xavier, Isabel Gordo
(Submitted on 11 Nov 2013)

The accumulation of adaptive mutations is essential for survival in novel environments. However, in clonal populations with a high mutational supply, the power of natural selection is expected to be limited. This is due to clonal interference – the competition of clones carrying different beneficial mutations – which leads to the loss of many small effect mutations and fixation of large effect ones. If interference is abundant, then mechanisms for horizontal transfer of genes, which allow the immediate combination of beneficial alleles in a single background, are expected to evolve. However, the relevance of interference in natural complex environments, such as the gut, is poorly known. To address this issue, we studied the invasion of beneficial mutations responsible for Escherichia coli’s adaptation to the mouse gut and demonstrate the pervasiveness of clonal interference. The observed dynamics of change in frequency of beneficial mutations are consistent with soft sweeps, where a similar adaptive mutation arises repeatedly on different haplotypes without reaching fixation. The genetic basis of the adaptive mutations revealed a striking parallelism in independently evolving populations. This was mainly characterized by the insertion of transposable elements in both coding and regulatory regions of a few genes. Interestingly in most populations, we observed a complete phenotypic sweep without loss of genetic variation. The intense clonal interference during adaptation to the gut environment, here demonstrated, may be important for our understanding of the levels of strain diversity of E. coli inhabiting the human gut microbiota and of its recombination rate.

# Some mathematical tools for the Lenski experiment

Some mathematical tools for the Lenski experiment
Bernard Ycart (LJK), Agnès Hamon (LJK), Joël Gaffé (LAPM), Dominique Schneider (LAPM)
(Submitted on 2 Oct 2013)

The Lenski experiment is a long term daily reproduction of Escherichia coli, that has evidenced phenotypic and genetic evolutions along the years. Some mathematical models, that could be usefull in understanding the results of that experiment, are reviewed here: stochastic and deterministic growth, mutation appearance and fixation, competition of species.

# Guidelines for the design of evolve and resequencing studies

Guidelines for the design of evolve and resequencing studies
Robert Kofler, Christian Schlötterer
(Submitted on 18 Jul 2013)

Standing genetic variation provides a rich reservoir of potentially useful mutations facilitating the adaptation to novel environments. Experimental evolution studies have demonstrated that rapid and strong phenotypic responses to selection can also be obtained in the laboratory. When combined with the Next Generation Sequencing technology, these experiments promise to identify the individual loci contributing to adaption. Nevertheless, until now, very little is known about the design of such evolve and resequencing (E&R) studies. Here, we use forward simulations of entire genomes to evaluate different experimental designs that aim to maximize the power to detect selected variants. We show that low linkage disequilibrium in the starting population, population size, duration of the experiment and the number of replicates are the key factors in determining the power and accuracy of E&R studies. Furthermore, replication of E&R is more important for detecting the targets of selection than increasing the population size. Using an optimized design beneficial loci with a selective advantage as low as s=0.005 can be identified at the nucleotide level. Even when a large number of loci are selected simultaneously, up to 56% can be reliably detected without incurring large numbers of false positives. Our computer simulations suggest that, with an adequate experimental design, E&R studies are a powerful tool to identify adaptive mutations from standing genetic variation and thereby provide an excellent means to analyze the trajectories of selected alleles in evolving populations

# Response to Horizontal gene transfer may explain variation in θs

Response to Horizontal gene transfer may explain variation in $\theta_s$

Inigo Martincorena, Nicholas M. Luscombe
(Submitted on 5 Nov 2012)

In a short article submitted to ArXiv [1], Maddamsetti et al. argue that the variation in the neutral mutation rate among genes in Escherichia coli that we recently reported [2] might be explained by horizontal gene transfer (HGT). To support their argument they present a reanalysis of synonymous diversity in 10 E.coli strains together with an analysis of a collection of 1,069 synonymous mutations found in repair-deficient strains in a long-term in vitro evolution experiment. Here we respond to this communication. Briefly, we explain that HGT was carefully accounted for in our study by multiple independent phylogenetic and population genetic approaches, and we show that there is no new evidence of HGT affecting our results. We also argue that caution must be exercised when comparing mutations from repair deficient strains to data from wild-type strains, as these conditions are dominated by different mutational processes. Finally, we reanalyse Maddamsetti’s collection of mutations from a long-term in vitro experiment and we report preliminary evidence of non-random variation of the mutation rate in these repair deficient strains.

# Horizontal gene transfer may explain variation in θs

Horizontal gene transfer may explain variation in θs
Rohan Maddamsetti, Philip J. Hatcher, Stéphane Cruveiller, Claudine Médigue, Jeffrey E. Barrick, Richard E. Lenski
(Submitted on 28 Sep 2012)

Martincorena et al. estimated synonymous diversity ($\theta s = 2N \mu$) across 2,930 orthologous gene alignments from 34 Escherichia coli genomes, and found substantial variation among genes in the density of synonymous polymorphisms. They argue that this pattern reflects variation in the mutation rate per nucleotide ($\mu$) among genes. However, the effective population size (N) is not necessarily constant across the genome. In particular, different genes may have different histories of horizontal gene transfer (HGT), whereas Martincorena et al. used a model with random recombination to calculate $\theta s$. They did filter alignments in an effort to minimize the effects of HGT, but we doubt that any procedure can completely eliminate HGT among closely related genomes, such as E. coli living in the complex gut community.
Here we show that there is no significant variation among genes in rates of synonymous substitutions in a long-term evolution experiment with E. coli and that the per-gene rates are not correlated with $\theta s$ estimates from genome comparisons. However, there is a significant association between $\theta s$ and HGT events. Together, these findings imply that $\theta s$ variation reflects different histories of HGT, not local optimization of mutation rates to reduce the risk of deleterious mutations as proposed by Martincorena et al.