A note on the distribution of admixture segment lengths and ancestry proportions under pulse and two-wave admixture models

A note on the distribution of admixture segment lengths and ancestry proportions under pulse and two-wave admixture models

Shai Carmi, James Xue, Itsik Pe’er
(Submitted on 19 Sep 2015)

Admixed populations are formed by the merging of two or more ancestral populations, and the ancestry of each locus in an admixed genome derives from either source. Consider a simple “pulse” admixture model, where populations A and B merged t generations ago without subsequent gene flow. We derive the distribution of the proportion of an admixed chromosome that has A (or B) ancestry, as a function of the chromosome length L, t, and the initial contribution of the A source, m. We demonstrate that these results can be used for inference of the admixture parameters. For more complex admixture models, we derive an expression in Laplace space for the distribution of ancestry proportions that depends on having the distribution of the lengths of segments of each ancestry. We obtain explicit results for the special case of a “two-wave” admixture model, where population A contributed additional migrants in one of the generations between the present and the initial admixture event. Specifically, we derive formulas for the distribution of A and B segment lengths and numerical results for the distribution of ancestry proportions. We show that for recent admixture, data generated under a two-wave model can hardly be distinguished from that generated under a pulse model.

Exact simulation of the Wright-Fisher diffusion

Exact simulation of the Wright-Fisher diffusion

Paul A. Jenkins, Dario Spano
(Submitted on 23 Jun 2015)

The Wright-Fisher family of diffusion processes is a class of evolutionary models widely used in population genetics, with applications also in finance and Bayesian statistics. Simulation and inference from these diffusions is therefore of widespread interest. However, simulating a Wright-Fisher diffusion is difficult because there is no known closed-form formula for its transition function. In this article we demonstrate that it is in fact possible to simulate exactly from the scalar Wright-Fisher diffusion with general drift, extending ideas based on retrospective simulation. Our key idea is to exploit an eigenfunction expansion representation of the transition function. This approach also yields methods for exact simulation from several processes related to the Wright-Fisher diffusion: (i) its moment dual, the ancestral process of an infinite-leaf Kingman coalescent tree; (ii) its infinite-dimensional counterpart, the Fleming-Viot process; and (iii) its bridges. Finally, we illustrate our method with an application to an evolutionary model for mutation and diploid selection. We believe our new perspective on diffusion simulation holds promise for other models admitting a transition eigenfunction expansion.

Linkage disequilibrium between single nucleotide polymorphisms and hypermutable loci

Linkage disequilibrium between single nucleotide polymorphisms and hypermutable loci

Sterling Sawaya, Matt Jones, Matt Keller
doi: http://dx.doi.org/10.1101/020909

Some diseases are caused by genetic loci with a high rate of change, and heritability in complex traits is likely to be partially caused by variation at these loci. These hypermutable elements, such as tandem repeats, change at rates that are orders of magnitude higher than the rates at which most single nucleotides mutate. However, single nucleotide polymorphisms, or SNPs, are currently the primary focus of genetic studies of human disease. Here we quantify the degree to which SNPs are correlated with hypermutable loci, examining a range of mutation rates that correspond to mutation rates at tandem repeat loci. We use established population genetics theory to relate mutation rates to recombination rates and compare the theoretical predictions to simulations. Both simulations and theory agree that, at the highest mutation rates, almost all correlation is lost between a hypermutable locus and surrounding SNPs. The theoretical predictions break down for middle to low mutation rates, differing widely from the simulated results. The simulation results suggest that some correlation remains between SNPs and hypermutable loci when mutation rates are on the lower end of the mutation spectrum. Consequently, in some cases SNPs can tag variation caused by tandem repeat loci. We also examine the linkage between SNPs and other SNPs and uncover ways in which the linkage disequilibrium of rare SNPs differs from that of hypermutable loci.

A Coalescent Model of a Sweep from a Uniquely Derived Standing Variant

A Coalescent Model of a Sweep from a Uniquely Derived Standing Variant

Jeremy J Berg, Graham Coop
doi: http://dx.doi.org/10.1101/019612

The use of genetic polymorphism data to understand the dynamics of adaptation and identify the loci that are involved has become a major pursuit of modern evolutionary genetics. In addition to the classical “hard sweep” hitchhiking model, recent research has drawn attention to the fact that the dynamics of adaptation can play out in a variety of different ways, and that the specific signatures left behind in population genetic data may depend somewhat strongly on these dynamics. One particular model for which a large number of empirical examples are already known is that in which a single derived mutation arises and drifts to some low frequency before an environmental change causes the allele to become beneficial and sweeps to fixation. Here, we pursue an analytical investigation of this model, bolstered and extended via simulation study. We use coalescent theory to develop an analytical approximation for the effect of a sweep from standing variation on the genealogy at the locus of the selected allele and sites tightly linked to it. We show that the distribution of haplotypes that the selected allele is present on at the time of the environmental change can be approximated by considering recombinant haplotypes as alleles in the infinite alleles model. We show that this approximation can be leveraged to make accurate predictions regarding patterns of genetic polymorphism following such a sweep. We then use simulations to highlight which sources of haplotypic information are likely to be most useful in distinguishing this model from neutrality, as well as from other sweep models, such as the classic hard sweep, and multiple mutation soft sweeps. We find that in general, adaptation from a uniquely derived standing variant will be difficult to detect on the basis of genetic polymorphism data alone, and when it can be detected, it will be difficult to distinguish from other varieties of selective sweeps.

Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum

Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum

Jonathan Terhorst, Yun S. Song
(Submitted on 16 May 2015)

The sample frequency spectrum (SFS) of DNA sequences from a collection of individuals is a summary statistic which is commonly used for parametric inference in population genetics. Despite the popularity of SFS-based inference methods, currently little is known about the information-theoretic limit on the estimation accuracy as a function of sample size. Here, we show that using the SFS to estimate the size history of a population has a minimax error of at least O(1/logs), where s is the number of independent segregating sites used in the analysis. This rate is exponentially worse than known convergence rates for many classical estimation problems in statistics. Another surprising aspect of our theoretical bound is that it does not depend on the dimension of the SFS, which is related to the number of sampled individuals. This means that, for a fixed number s of segregating sites considered, using more individuals does not help to reduce the minimax error bound. Our result pertains to populations that have experienced a bottleneck, and we argue that it can be expected to apply to many populations in nature.

Coalescent times and patterns of genetic diversity in species with facultative sex: effects of gene conversion, population structure and heterogeneity

Coalescent times and patterns of genetic diversity in species with facultative sex: effects of gene conversion, population structure and heterogeneity

Matthew Hartfield , Stephen I. Wright , Aneil F. Agrawal

Many diploid organisms undergo facultative sexual reproduction. However, little is currently known concerning the distribution of neutral genetic variation amongst facultative sexuals except in very simple cases. Understanding this distribution is important when making inferences about rates of sexual reproduction, effective population size and demographic history. Here, we extend coalescent theory in diploids with facultative sex to consider gene conversion, selfing, population subdivision, and temporal and spatial heterogeneity in rates of sex. In addition to analytical results for two-sample coalescent times, we outline a coalescent algorithm that accommodates the complexities arising from partial sex; this algorithm can be used to generate multi-sample coalescent distributions. A key result is that when sex is rare, gene conversion becomes a significant force in reducing diversity within individuals, which can remove genomic signatures of infrequent sex (the ‘Meselson Effect’) or entirely reverse the predictions. Our models offer improved methods for assessing the null model (I.e. neutrality) of patterns of molecular variation in facultative sexuals.

Mitochondria, mutations and sex: a new hypothesis for the evolution of sex based on mitochondrial mutational erosion

Mitochondria, mutations and sex: a new hypothesis for the evolution of sex based on mitochondrial mutational erosion

Justin Havird , Matthew D Hall , Damian Dowling
doi: http://dx.doi.org/10.1101/019125

The evolution of sex in eukaryotes represents a paradox, given the “two-fold” fitness cost it incurs. We hypothesize that the mutational dynamics of the mitochondrial genome would have favoured the evolution of sexual reproduction. Mitochondrial DNA (mtDNA) exhibits a high mutation rate across most eukaryote taxa, and several lines of evidence suggest this high rate is an ancestral character. This seems inexplicable given mtDNA-encoded genes underlie the expression of life’s most salient functions, including energy conversion. We propose that negative metabolic effects linked to mitochondrial mutation accumulation would have invoked selection for sexual recombination between divergent host nuclear genomes in early eukaryote lineages. This would provide a mechanism by which recombinant host genotypes could be rapidly shuffled and screened for the presence of compensatory modifiers that offset mtDNA-induced harm. Under this hypothesis, recombination provides the genetic variation necessary for compensatory nuclear coadaptation to keep pace with mitochondrial mutation accumulation.