# Finite populations with frequency-dependent selection: a genealogical approach

Finite populations with frequency-dependent selection: a genealogical approach

Peter Pfaffelhuber, Benedikt Vogt
(Submitted on 28 Jul 2012)

Evolutionary models for populations of constant size are frequently studied using the Moran model, the Wright-Fisher model, or their diffusion limits. When evolution is neutral, a random genealogy given through Kingman’s coalescent is used in order to understand basic properties of such models. Here, we address the use of a genealogical perspective for models with weak frequency-dependent selection, i.e. N s =: {\alpha} is small, and s is the fitness advantage of a fit individual and N is the population size. When computing fixation probabilities, this leads either to the approach proposed by Rousset (2003), who argues how to use the Kingman’s coalescent for weak selection, or to extensions of the ancestral selection graph of Neuhauser and Krone (1997) and Neuhauser (1999). As an application, we re-derive the one-third law of evolutionary game theory (Nowak et al., 2004). In addition, we provide the approximate distribution of the genealogical distance of two randomly sampled individuals under linear frequency-dependence.

# Genealogies of rapidly adapting populations

Richard A. Neher, Oskar Hallatschek
(Submitted on 15 Aug 2012)

The genetic diversity of a species is shaped by its recent evolutionary history and can be used to infer demographic events or selective sweeps. Most inference methods are based on the null hypothesis that natural selection is a weak evolutionary force. However, many species, particularly pathogens, are under continuous pressure to adapt in response to changing environments. A statistical framework for inference from diversity data of such populations is currently lacking. Toward this goal, we explore the properties of genealogies that emerge from models of continual adaptation. We show that lineages trace back to a small pool of highly fit ancestors, in which simultaneous coalescence of more than two lineages frequently occurs. While such multiple mergers are unlikely under the neutral coalescent, they create a unique genetic footprint in adapting populations. The site frequency spectrum of derived neutral alleles, for example, is non-monotonic and has a peak at high frequencies, whereas Tajima’s D becomes more and more negative with increasing sample size. Since multiple merger coalescents emerge in various evolutionary scenarios characterized by sustained selection pressures, we argue that they should be considered as null-models for adapting populations.

# Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations.

Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations.

Michael M. Desai, Aleksandra M. Walczak, Daniel S. Fisher
(Submitted on 16 Aug 2012)
Positive selection distorts the structure of genealogies and hence alters patterns of genetic variation within a population. Most analyses of these distortions focus on the signatures of hitchhiking due to hard or soft selective sweeps at a single genetic locus. However, in linked regions of rapidly adapting genomes, multiple beneficial mutations at different loci can segregate simultaneously within the population, an effect known as clonal interference. This leads to a subtle interplay between hitchhiking and interference effects, which leads to a unique signature of rapid adaptation on genetic variation both at the selected sites and at linked neutral loci. Here, we introduce an effective coalescent theory (a “fitness-class coalescent”) that describes how positive selection at many perfectly linked sites alters the structure of genealogies. We use this theory to calculate several simple statistics describing genetic variation within a rapidly adapting population, and to implement efficient backwards-time coalescent simulations which can be used to predict how clonal interference alters the expected patterns of molecular evolution.

# How to infer relative fitness from a sample of genomic sequences

How to infer relative fitness from a sample of genomic sequences
(Submitted on 29 Aug 2012)

Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman’s coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we shall demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test {\it in silico} using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator which identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with the actual fitness, with top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3 depending on the mutation/selection parameters. The ranking also enables to predict the common genotype of the future population. While the inference accuracy increases monotonically with sample size, sample sizes of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks.

# A sequentially Markov conditional sampling distribution for structured populations with migration and recombination

A sequentially Markov conditional sampling distribution for structured populations with migration and recombination

Matthias Steinrücken, Joshua S. Paul, Yun S. Song
(Submitted on 25 Aug 2012)

Conditional sampling distributions (CSDs), sometimes referred to as copying models, underlie numerous practical tools in population genomic analyses. Though an important application that has received much attention is the inference of population structure, the explicit exchange of migrants at specified rates has not hitherto been incorporated into the CSD in a principled framework. Recently, in the case of a single panmictic population, a sequentially Markov CSD has been developed as an accurate, efficient approximation to a principled CSD derived from the diffusion process dual to the coalescent with recombination. In this paper, the sequentially Markov CSD framework is extended to incorporate subdivided population structure, thus providing an efficiently computable CSD that admits a genealogical interpretation related to the structured coalescent with migration and recombination. As a concrete application, it is demonstrated empirically that the CSD developed here can be employed to yield accurate estimation of a wide range of migration rates.

# The variance of identity-by-descent sharing in the Wright-Fisher model

The variance of identity-by-descent sharing in the Wright-Fisher model

Shai Carmi, Pier Francesco Palamara, Vladimir Vacic, Todd Lencz, Ariel Darvasi, Itsik Pe’er
(Submitted on 21 Jun 2012)

Widespread sharing of long, identical-by-descent (IBD) genetic segments is a hallmark of populations that have experienced a recent bottleneck. The detection of these IBD segments is now feasible, enabling a wide range of applications from phasing and imputation to demographic inference. Here, we study the distribution of IBD sharing in the Wright-Fisher model. Using coalescent theory, we calculate the mean and variance of the total sharing between arbitrary pairs of individuals. We then study the cohort-averaged sharing: the average total sharing between one individual to the rest of the cohort. We find that for large cohorts, the cohort-averaged sharing is distributed approximately normally. Surprisingly, the variance of this distribution remains large even for large cohorts, implying the existence of “hyper-sharing” individuals. The presence of such individuals bears important consequences to the design of sequencing studies, since, if they are selected for whole-genome sequencing, a larger fraction of the cohort can be subsequently imputed. We calculate the expected gain in power of imputation by IBD, and subsequently, in power to detect an association, when individuals are either randomly selected or are specifically the hyper-sharing individuals. Finally, we study the distribution of pairwise sharing and cohort-averaged sharing in the Ashkenazi Jewish population.

# Single–crossover recombination and ancestral recombination trees.

Single–crossover recombination and ancestral recombination trees.
by Ellen Baake, Ute von Wangenheim

We consider the Wright-Fisher model for a population of $N$ individuals, each identified with a sequence of a finite number of sites, and single-crossover recombination between them. We trace back the ancestry of single individuals from the present population. In the $N \to \infty$ limit without rescaling of parameters or time, this ancestral process is described by a random tree, whose branching events correspond to the splitting of the sequence due to recombination. With the help of a decomposition of the trees into subtrees and an inclusion-exclusion principle, we find a closed-form expression for the probabilities of the topologies of the ancestral trees. At the same time, these probabilities lead to an explicit solution of the deterministic single-crossover equation. The latter is a discrete-time dynamical system that emerges from the Wright-Fisher model via a law of large numbers and has been waiting for a solution for many decades.