Distortion of genealogical properties when the sample is very large

Distortion of genealogical properties when the sample is very large
Anand Bhaskar, Andrew G. Clark, Yun S. Song
(Submitted on 1 Aug 2013)

Study sample sizes in human genetics are growing rapidly, and in due course it will become routine to analyze samples with hundreds of thousands if not millions of individuals. In addition to posing computational challenges, such large sample sizes call for carefully re-examining the theoretical foundation underlying commonly-used analytical tools. Here, we study the accuracy of the coalescent, a central model for studying the ancestry of a sample of individuals. The coalescent arises as a limit of a large class of random mating models and it is an accurate approximation to the original model provided that the population size is sufficiently larger than the sample size. We develop a method for performing exact computation in the discrete-time Wright-Fisher (DTWF) model and compare several key genealogical quantities of interest with the coalescent predictions. For realistic demographic scenarios, we find that there are a significant number of multiple- and simultaneous-merger events under the DTWF model, which are absent in the coalescent by construction. Furthermore, for large sample sizes, there are noticeable differences in the expected number of rare variants between the coalescent and the DTWF model. To balance the tradeoff between accuracy and computational efficiency, we propose a hybrid algorithm that utilizes the DTWF model for the recent past and the coalescent for the more distant past. Our results demonstrate that the hybrid method with only a handful of generations of the DTWF model leads to a frequency spectrum that is quite close to the prediction of the full DTWF model.

A path integral formulation of the Wright-Fisher process with genic selection

A path integral formulation of the Wright-Fisher process with genic selection
Joshua G. Schraiber
(Submitted on 29 Jul 2013)

The Wright-Fisher process with selection is an important tool in population genetics theory. Traditional analysis of this process relies on the diffusion approximation. The diffusion approximation is usually studied in a partial differential equations framework. In this paper, I introduce a path integral formalism to study the Wright-Fisher process with selection and use that formalism to obtain a simple perturbation series to approximate the transition density. The perturbation series can be understood in terms of Feynman diagrams, which have a simple probabilistic interpretation in terms of selective events. The perturbation series proves to be an accurate approximation of the transition density for weak selection and is shown to be arbitrarily accurate for any selection coefficient.

Robust forward simulations of recurrent positive selection

Robust forward simulations of recurrent positive selection
Lawrence H. Uricchio, Ryan D. Hernandez
(Submitted on 24 Jul 2013)

It is well known that recurrent positive selection reduces the amount of genetic variation at linked sites. In recent decades, analytical results have been proposed to quantify the magnitude of this reduction with simple Wright-Fisher models and diffusion approximations. However, extending these results to include interference between selected sites, arbitrary selection schemes, and complicated demographic processes has proved to be challenging. Forward simulation can provide insights into these processes, but few studies have examined recurrent positive selection in a forward simulation context due to computational constraints. Here, we extend the flexible forward simulator SFS_CODE to greatly improve the efficiency of simulations of recurrent positive selection. Forward simulations are computationally intensive and often necessitate rescaling of relevant parameters (e.g., population size and sequence length) to achieve computational feasibility. However, it is not obvious that parameter rescaling will maintain expected patterns of diversity in all parameter regimes. We develop a simple method for parameter rescaling that provides the best possible computational performance for a given error tolerance, and a detailed theoretical analysis of the robustness of rescaling across the parameter space. These results show that ad hoc approaches to parameter rescaling under the recurrent hitchhiking model may not always provide sufficiently accurate dynamics, potentially skewing patterns of diversity in simulated DNA sequences.

Comments:

Speed of adaptation and genomic signatures in arms race and trench warfare models of host-parasite coevolution

Speed of adaptation and genomic signatures in arms race and trench warfare models of host-parasite coevolution
Aurelien Tellier, Stefany Moreno-Game, Wolfgang Stephan
(Submitted on 25 Jul 2013)

Host and parasite population genomic data are increasingly used to discover novel major genes underlying coevolution, assuming that natural selection generates two distinguishable polymorphism patterns: selective sweeps and balancing selection. These genomic signatures would result from two coevolutionary dynamics, the trench warfare with fast cycles of allele frequencies and the arms race with slow recurrent fixation of alleles. However, based on genome scans for selection, few genes for coevolution have yet been found in hosts. To address this issue, we build a gene-for-gene model with genetic drift, mutation and integrating coalescent simulations to study observable genomic signatures at host and parasite loci. In contrast to the conventional wisdom, we show that coevolutionary cycles are not faster under the trench warfare model compared to the arms race, except for large population sizes and high values of coevolutionary costs. Based on the generated SNP frequencies, the expected balancing selection signature under the trench warfare dynamics appears to be only observable in parasite sequences in a limited range of parameter, if effective population sizes are sufficiently large (>1000) and if selection has been acting for a long time (>4N generations). On the other hand, the typical signature of the arms race dynamics, i.e. selective sweeps, can be detected in parasite and to a lesser extent in host populations even if coevolution is recent. We suggest to study signatures of coevolution via population genomics of parasites rather than hosts, and caution against inferring coevolutionary dynamics based on the speed of coevolution.

Migration-selection balance at multiple loci and selection on dominance and recombination

Migration-selection balance at multiple loci and selection on dominance and recombination
Alexey Yanchukov, Stephen R. Proulx
(Submitted on 15 Jul 2013)

A steady influx of a single deleterious multilocus genotype will impose genetic load on the resident population and leave multiple descendants carrying various numbers of the foreign alleles. Provided that the foreign types are rare at equilibrium, and that all immigrant genes will eventually be eliminated by selection, the population structure can be inferred explicitly from the deterministic branching process taking place within a single immigrant lineage. Unless the migration and recombination rates were high, this simple method was a very close approximation to the simulated migration-selection balance with all possible multilocus genotypes considered.

The Changing Geometry of a Fitness Landscape Along an Adaptive Walk

The Changing Geometry of a Fitness Landscape Along an Adaptive Walk
Devin Greene, Krisitna Crona
(Submitted on 7 Jul 2013)

It has recently been noted that the relative prevalence of the various kinds of epistasis varies along an adaptive walk. This has been explained as a result of mean regression in NK model fitness landscapes. Here we show that this phenomenon occurs quite generally in fitness landscapes. We propose a simple and general explanation for this phenomemon, confirming the role of mean regression. We provide support for this explanation with simulations, and discuss the empirical relevance of our findings.

Evolution on genotype networks leads to phenotypic entrapment

Evolution on genotype networks leads to phenotypic entrapment
Susanna Manrubia, José A. Cuesta
(Submitted on 3 Jul 2013)

Large sets of genotypes give rise to the same phenotype because phenotypic expression is highly redundant. Accordingly, a population can accept mutations without altering its phenotype, as long as they transform its genotype into another one on the same set. By linking every pair of genotypes that are mutually accessible through mutation, genotypes organize themselves into genotype networks (GN). These networks are known to be heterogeneous and assortative. As these features condition the probability that mutations keep the phenotype unchanged—hence becoming blind to natural selection—it follows that the topology of the GN will influence the evolutionary dynamics of the population. In this letter we analyze this effect by studying the dynamics of random walks (RW) on assortative networks with arbitrary topology. We find that the probability that a RW leaves the network is smaller the longer the time spent in it—i.e., the process is not Markovian. From the biological viewpoint, this “phenotypic entrapment” entails an acceleration in the fixation of neutral mutations, thus implying a non-uniform increase in the ticking rate of the molecular clock with the age of branches in phylogenetic trees. We also show that this effect is stronger the larger the fitness of the current phenotype relative to that of neighboring phenotypes.

The rate of adaptation in large sexual populations

The rate of adaptation in large sexual populations
D. B. Weissman, O. Hallatschek
(Submitted on 2 Jul 2013)

In large populations, multiple beneficial mutations may be simultaneously spreading. In asexual populations, these mutations must either arise on the same background or compete against each other. In sexual populations, recombination can bring together beneficial alleles from different backgrounds, but tightly linked alleles may still greatly interfere with each other. We show for well-mixed populations that when this interference is strong, the genome can be seen as consisting of many effectively asexual stretches linked together. The rate at which beneficial alleles fix is thus roughly proportional to the rate of recombination, and depends only logarithmically on the mutation supply and the strength of selection. Our scaling arguments also allow to predict, with reasonable accuracy, the distribution of effects of fixed mutations when new mutations have broadly-distributed effects. We focus on the regime in which crossovers occur more frequently than beneficial mutations, as is likely to be the case for many natural populations.

The waiting time for a second mutation: an alternative to the Moran model

The waiting time for a second mutation: an alternative to the Moran model
Rinaldo B. Schinazi
(Submitted on 28 Jun 2013)

The appearance of cancer in a tissue is thought to be the result of two or more successive mutations. We propose a stochastic model that allows for an exact computation of the distribution of the waiting time for a second mutation. This models the time of appearance of the first cancerous cell in a tissue. Our model is an alternative to the Moran model with mutations.

The impact of population demography and selection on the genetic architecture of complex traits

The impact of population demography and selection on the genetic architecture of complex traits
Kirk E. Lohmueller
(Submitted on 21 Jun 2013)

Studies of thousands of individuals have found genetic evidence for dramatic population growth in recent human history. These studies have also documents high numbers of amino acid changing polymorphisms that are likely evolutionarily important and may be of medic relevance. Here I use population genetic models to demonstrate how the recent population growth has directly led to the accumulation of deleterious amino acid changing polymorphism. I show that recent growth increases the proportion of non synonymous SNPs and that the average mutation is more deleterious in an expanding population than in a non-exanded population. However, population growth does not affect the genetic load of the population. Additionally, I investigate the consequences of recent population growth on the architecture of complex traits. If a mutation’s effect on disease status is correlated with its effect on fitness, then rare variants explain a greater portion of the additive genetic variance of the trait in a population that has recently expanded than in a population that did not recently expand. Further, recent growth can increase the expected number of causal variants for a disease. Such heterogeneity will likely reduce the power of commonly used rare variants association tests. Finally, recent population growth also reduces the causal allele frequency in cases at single mutations, which could decrease the power of single-marker association tests. These findings suggest careful consideration of recent population history will be essential for designing optimal association studies for low-frequency and rare variants.