Exact simulation of the Wright-Fisher diffusion
Paul A. Jenkins, Dario Spano
(Submitted on 23 Jun 2015)
The Wright-Fisher family of diffusion processes is a class of evolutionary models widely used in population genetics, with applications also in finance and Bayesian statistics. Simulation and inference from these diffusions is therefore of widespread interest. However, simulating a Wright-Fisher diffusion is difficult because there is no known closed-form formula for its transition function. In this article we demonstrate that it is in fact possible to simulate exactly from the scalar Wright-Fisher diffusion with general drift, extending ideas based on retrospective simulation. Our key idea is to exploit an eigenfunction expansion representation of the transition function. This approach also yields methods for exact simulation from several processes related to the Wright-Fisher diffusion: (i) its moment dual, the ancestral process of an infinite-leaf Kingman coalescent tree; (ii) its infinite-dimensional counterpart, the Fleming-Viot process; and (iii) its bridges. Finally, we illustrate our method with an application to an evolutionary model for mutation and diploid selection. We believe our new perspective on diffusion simulation holds promise for other models admitting a transition eigenfunction expansion.
Linkage disequilibrium between single nucleotide polymorphisms and hypermutable loci
Sterling Sawaya, Matt Jones, Matt Keller
Some diseases are caused by genetic loci with a high rate of change, and heritability in complex traits is likely to be partially caused by variation at these loci. These hypermutable elements, such as tandem repeats, change at rates that are orders of magnitude higher than the rates at which most single nucleotides mutate. However, single nucleotide polymorphisms, or SNPs, are currently the primary focus of genetic studies of human disease. Here we quantify the degree to which SNPs are correlated with hypermutable loci, examining a range of mutation rates that correspond to mutation rates at tandem repeat loci. We use established population genetics theory to relate mutation rates to recombination rates and compare the theoretical predictions to simulations. Both simulations and theory agree that, at the highest mutation rates, almost all correlation is lost between a hypermutable locus and surrounding SNPs. The theoretical predictions break down for middle to low mutation rates, differing widely from the simulated results. The simulation results suggest that some correlation remains between SNPs and hypermutable loci when mutation rates are on the lower end of the mutation spectrum. Consequently, in some cases SNPs can tag variation caused by tandem repeat loci. We also examine the linkage between SNPs and other SNPs and uncover ways in which the linkage disequilibrium of rare SNPs differs from that of hypermutable loci.
A Coalescent Model of a Sweep from a Uniquely Derived Standing Variant
Jeremy J Berg, Graham Coop
The use of genetic polymorphism data to understand the dynamics of adaptation and identify the loci that are involved has become a major pursuit of modern evolutionary genetics. In addition to the classical “hard sweep” hitchhiking model, recent research has drawn attention to the fact that the dynamics of adaptation can play out in a variety of different ways, and that the specific signatures left behind in population genetic data may depend somewhat strongly on these dynamics. One particular model for which a large number of empirical examples are already known is that in which a single derived mutation arises and drifts to some low frequency before an environmental change causes the allele to become beneficial and sweeps to fixation. Here, we pursue an analytical investigation of this model, bolstered and extended via simulation study. We use coalescent theory to develop an analytical approximation for the effect of a sweep from standing variation on the genealogy at the locus of the selected allele and sites tightly linked to it. We show that the distribution of haplotypes that the selected allele is present on at the time of the environmental change can be approximated by considering recombinant haplotypes as alleles in the infinite alleles model. We show that this approximation can be leveraged to make accurate predictions regarding patterns of genetic polymorphism following such a sweep. We then use simulations to highlight which sources of haplotypic information are likely to be most useful in distinguishing this model from neutrality, as well as from other sweep models, such as the classic hard sweep, and multiple mutation soft sweeps. We find that in general, adaptation from a uniquely derived standing variant will be difficult to detect on the basis of genetic polymorphism data alone, and when it can be detected, it will be difficult to distinguish from other varieties of selective sweeps.
Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum
Jonathan Terhorst, Yun S. Song
(Submitted on 16 May 2015)
The sample frequency spectrum (SFS) of DNA sequences from a collection of individuals is a summary statistic which is commonly used for parametric inference in population genetics. Despite the popularity of SFS-based inference methods, currently little is known about the information-theoretic limit on the estimation accuracy as a function of sample size. Here, we show that using the SFS to estimate the size history of a population has a minimax error of at least O(1/logs), where s is the number of independent segregating sites used in the analysis. This rate is exponentially worse than known convergence rates for many classical estimation problems in statistics. Another surprising aspect of our theoretical bound is that it does not depend on the dimension of the SFS, which is related to the number of sampled individuals. This means that, for a fixed number s of segregating sites considered, using more individuals does not help to reduce the minimax error bound. Our result pertains to populations that have experienced a bottleneck, and we argue that it can be expected to apply to many populations in nature.
Coalescent times and patterns of genetic diversity in species with facultative sex: effects of gene conversion, population structure and heterogeneity
Matthew Hartfield , Stephen I. Wright , Aneil F. Agrawal
Many diploid organisms undergo facultative sexual reproduction. However, little is currently known concerning the distribution of neutral genetic variation amongst facultative sexuals except in very simple cases. Understanding this distribution is important when making inferences about rates of sexual reproduction, effective population size and demographic history. Here, we extend coalescent theory in diploids with facultative sex to consider gene conversion, selfing, population subdivision, and temporal and spatial heterogeneity in rates of sex. In addition to analytical results for two-sample coalescent times, we outline a coalescent algorithm that accommodates the complexities arising from partial sex; this algorithm can be used to generate multi-sample coalescent distributions. A key result is that when sex is rare, gene conversion becomes a significant force in reducing diversity within individuals, which can remove genomic signatures of infrequent sex (the ‘Meselson Effect’) or entirely reverse the predictions. Our models offer improved methods for assessing the null model (I.e. neutrality) of patterns of molecular variation in facultative sexuals.
Mitochondria, mutations and sex: a new hypothesis for the evolution of sex based on mitochondrial mutational erosion
Justin Havird , Matthew D Hall , Damian Dowling
The evolution of sex in eukaryotes represents a paradox, given the “two-fold” fitness cost it incurs. We hypothesize that the mutational dynamics of the mitochondrial genome would have favoured the evolution of sexual reproduction. Mitochondrial DNA (mtDNA) exhibits a high mutation rate across most eukaryote taxa, and several lines of evidence suggest this high rate is an ancestral character. This seems inexplicable given mtDNA-encoded genes underlie the expression of life’s most salient functions, including energy conversion. We propose that negative metabolic effects linked to mitochondrial mutation accumulation would have invoked selection for sexual recombination between divergent host nuclear genomes in early eukaryote lineages. This would provide a mechanism by which recombinant host genotypes could be rapidly shuffled and screened for the presence of compensatory modifiers that offset mtDNA-induced harm. Under this hypothesis, recombination provides the genetic variation necessary for compensatory nuclear coadaptation to keep pace with mitochondrial mutation accumulation.
The generalised quasispecies
Raphaël Cerf, Joseba Dalmau
(Submitted on 22 Apr 2015)
We study Eigen’s quasispecies model in the asymptotic regime where the length of the genotypes goes to infinity and the mutation probability goes to 0. We give several explicit formulas for the stationary solutions of the limiting system of differential equations.