Population genetics of identity by descent

Population genetics of identity by descent
Pier Francesco Palamara, Ph.D. thesis

Recent improvements in high-throughput genotyping and sequencing technologies have afforded the collection of massive, genome-wide datasets of DNA information from hundreds of thousands of individuals. These datasets, in turn, provide unprecedented opportunities to reconstruct the history of human populations and detect genotype-phenotype association. Recently developed computational methods can identify long-range chromosomal segments that are identical across samples, and have been transmitted from common ancestors that lived tens to hundreds of generations in the past. These segments reveal genealogical relationships that are typically unknown to the carrying individuals. In this work, we demonstrate that such identical-by-descent (IBD) segments are informative about a number of relevant population genetics features: they enable the inference of details about past population size fluctuations, migration events, and they carry the genomic signature of natural selection. We derive a mathematical model, based on coalescent theory, that allows for a quantitative description of IBD sharing across purportedly unrelated individuals, and develop inference procedures for the reconstruction of recent demographic events, where classical methodologies are statistically underpowered. We analyze IBD sharing in several contemporary human populations, including representative communities of the Jewish Diaspora, Kenyan Maasai samples, and individuals from several Dutch provinces, in all cases retrieving evidence of fine-scale demographic events from recent history. Finally, we expand the presented model to describe distributions for those sites in IBD shared segments that harbor mutation events, showing how these may be used for the inference of mutation rates in humans and other species.

The Role of Migration in the Evolution of Phenotypic Switching

The Role of Migration in the Evolution of Phenotypic Switching

Oana Carja, Robert E Furrow, Marc W Feldman

Stochastic switching is an example of phenotypic bet-hedging, where an individual can switch between different phenotypic states in a fluctuating environment. Although the evolution of stochastic switching has been studied when the environment varies temporally, there has been little theoretical work on the evolution of phenotypic switching under both spatially and temporally fluctuating selection pressures. Here we use a population genetic model to explore the interaction of temporal and spatial variation in the evolutionary dynamics of phenotypic switching. We find that spatial variation in selection is important; when selection pressures are similar across space, migration can decrease the rate of switching, but when selection pressures differ spatially, increasing migration between demes can facilitate the evolution of higher rates of switching. These results may help explain the diverse array of non-genetic contributions to phenotypic variability and phenotypic inheritance observed in both wild and experimental populations.

Genealogy of a Wright Fisher model with strong seed bank component

Genealogy of a Wright Fisher model with strong seed bank component

Jochen Blath, Bjarki Eldon, Adrián González Casanova, Noemi Kurt
(Submitted on 12 Mar 2014)

We investigate the behaviour of the genealogy of a Wright-Fisher population model under the influence of a strong seed-bank effect. More precisely, we consider a simple seed-bank age distribution with two atoms, leading to either classical or long genealogical jumps (the latter modeling the effect of seed-dormancy). We assume that the length of these long jumps scales like a power Nβ of the original population size N, thus giving rise to a `strong’ seed-bank effect. For a certain range of β, we prove that the ancestral process of a sample of n individuals converges under a non-classical time-scaling to Kingman’s n−coalescent. Further, for a wider range of parameters, we analyze the time to the most recent common ancestor of two individuals analytically and by simulation.

Adaptive evolution of molecular phenotypes

Adaptive evolution of molecular phenotypes

Torsten Held, Armita Nourmohammad, Michael Lässig
(Submitted on 7 Mar 2014)

Molecular phenotypes link genomic information with organismic functions, fitness, and evolution. Quantitative traits are complex phenotypes that depend on multiple genomic loci. In this paper, we study the adaptive evolution of a quantitative trait under time-dependent selection, which arises from environmental changes or through fitness interactions with other co-evolving phenotypes. We analyze a model of trait evolution under mutations and genetic drift in a single-peak fitness seascape. The fitness peak performs a constrained random walk in the trait amplitude, which determines the time-dependent trait optimum in a given population. We derive analytical expressions for the distribution of the time-dependent trait divergence between populations and of the trait diversity within populations. Based on this solution, we develop a method to infer adaptive evolution of quantitative traits. Specifically, we show that the ratio of the average trait divergence and the diversity is a universal function of evolutionary time, which predicts the stabilizing strength and the driving rate of the fitness seascape. From an information-theoretic point of view, this function measures the macro-evolutionary entropy in a population ensemble, which determines the predictability of the evolutionary process. Our solution also quantifies two key characteristics of adapting populations: the cumulative fitness flux, which measures the total amount of adaptation, and the adaptive load, which is the fitness cost due to a population’s lag behind the fitness peak.

The limits of selection under plant domestication

The limits of selection under plant domestication
Robin G. Allaby, Dorian Q. Fuller, James L. Kitchen
Subjects: Populations and Evolution (q-bio.PE)

Plant domestication involved a process of selection through human agency of a series of traits collectively termed the domestication syndrome. Current debate concerns the pace at which domesticated plants emerged from cultivated wild populations and how many genes were involved. Here we present simulations that test how many genes could have been involved by considering the cost of selection. We demonstrate the selection load that can be endured by populations increases with decreasing selection coefficients and greater numbers of loci down to values of about s = 0.005, causing a driving force that increases the number of loci under selection. As the number of loci under selection increases, an effect of co-selection increases resulting in individual unlinked loci being fixed more rapidly in out-crossing populations, representing a second driving force to increase the number of loci under selection. In inbreeding systems co-selection results in interference and reduced rates of fixation but does not reduce the size of the selection load that can be endured. These driving forces result in an optimum pace of genome evolution in which 50-100 loci are the most that could be under selection in a cultivation regime. Furthermore, the simulations do not preclude the existence of selective sweeps but demonstrate that they come at a cost of the selection load that can be endured and consequently a reduction of the capacity of plants to adapt to new environments, which may contribute to the explanation of why selective sweeps have been so rarely detected in genome studies.

Can one hear the shape of a population history?

Can one hear the shape of a population history?
Junhyong Kim, Elchanan Mossel, Miklós Z. Rácz, Nathan Ross
(Submitted on 11 Feb 2014)

Reconstructing past population size from present day genetic data is a major goal of population genetics. Recent empirical studies infer population size history using coalescent-based models applied to a small number of individuals. While it is known that the allelic spectrum is not sufficient to infer the population size history, the distribution of coalescence times is. Here we provide tight bounds on the amount of information needed to recover the population size history at a certain level of accuracy assuming data given either by exact coalescence times, or given blocks of non-recombinant DNA sequences whose loci have approximately equal times to coalescence. Importantly, we prove lower bounds showing that it is impossible to accurately deduce population histories given limited data.

The fixation time of a strongly beneficial allele in a structured population


The fixation time of a strongly beneficial allele in a structured population

Andreas Greven, Peter Pfaffelhuber, Cornelia Pokalyuk, Anton Wakolbinger
Comments: 41 pages, 4 figures
Subjects: Probability (math.PR); Populations and Evolution (q-bio.PE)

For a beneficial allele which enters a large unstructured population and eventually goes to fixation, it is known that the time to fixation is approximately $2\log(\alpha)/\alpha$ for a large selection coefficent $\alpha$. In the presence of spatial structure with migration between colonies we detect various regimes of the migration rate $\mu$ for which the fixation times have different asymptotics as $\alpha \to \infty$. If $\mu$ is of order $\alpha$, the allele fixes (as in the spatially unstructured case) in time $\sim 2\log(\alpha)/\alpha$. If $\mu$ is of order $\alpha^p, 0\leq p \leq 1$, the fixation time is $\sim (2 + (1-p)d) \log(\alpha)/\alpha$, where $d$ is the maximum of the migration steps that are required from the colony where the beneficial allele entered to any other colony. If $\mu = 1/\log(\alpha)$, the fixation time is $\sim (2+S)\log(\alpha)/\alpha$, where $S$ is a random time in a simple epidemic model. The main idea for our analysis is to combine a new moment dual for the process conditioned to fixation with the time reversal in equilibrium of a spatial version of Neuhauser and Krone’s ancestral selection graph.

The roles of standing genetic variation and evolutionary history in determining the evolvability of anti-predator strategies

The roles of standing genetic variation and evolutionary history in determining the evolvability of anti-predator strategies
Jordan Fish, Daniel R O’Donnell, Abhijna Parigi, Ian Dworkin, Aaron P Wagner
Standing genetic variation and the historical environment in which that variation arises (evolutionary history) are both potentially significant determinants of a population’s capacity for evolutionary response to a changing environment. We evaluated the relative importance of these two factors in influencing the evolutionary trajectories in the face of sudden environmental change. We used the open-ended digital evolution software Avida to examine how historic exposure to predation pressures, different levels of genetic variation, and combinations of the two, impact anti-predator strategies and competitive abilities evolved in the face of threats from new, invasive, predator populations. We show that while standing genetic variation plays some role in determining evolutionary responses, evolutionary history has the greater influence on a population’s capacity to evolve effective anti-predator traits. This adaptability likely reflects the relative ease of repurposing existing, relevant genes and traits, and the broader potential value of the generation and maintenance of adaptively flexible traits in evolving populations.

The arrival of the frequent: how bias in genotype-phenotype maps can steer populations to local optima

The arrival of the frequent: how bias in genotype-phenotype maps can steer populations to local optima

Ard A Louis, Steffen Schaper
(Submitted on 6 Feb 2014)

Genotype-phenotype (GP) maps specify how the random mutations that change genotypes generate variation by altering phenotypes, which, in turn, can trigger selection. Many GP maps share the following general properties: 1) The number of genotypes NG is much larger than the number of selectable phenotypes; 2) Neutral exploration changes the variation that is accessible to the population; 3) The distribution of phenotype frequencies Fp=Np/NG, with Np the number of genotypes mapping onto phenotype p, is highly biased: the majority of genotypes map to only a small minority of the phenotypes. Here we explore how these properties affect the evolutionary dynamics of haploid Wright-Fisher models that are coupled to a simplified and general random GP map or to a more complex RNA sequence to secondary structure map. For both maps the probability of a mutation leading to a phenotype p scales to first order as Fp, although for the RNA map there are further correlations as well. By using mean-field theory, supported by computer simulations, we show that the discovery time Tp of a phenotype p similarly scales to first order as 1/Fp for a wide range of population sizes and mutation rates in both the monomorphic and polymorphic regimes. These differences in the rate at which variation arises can vary over many orders of magnitude. Phenotypic variation with a larger Fp is therefore be much more likely to arise than variation with a small Fp. We show, using the RNA model, that frequent phenotypes (with larger Fp) can fix in a population even when alternative, but less frequent, phenotypes with much higher fitness are potentially accessible. In other words, if the fittest never `arrive’ on the timescales of evolutionary change, then they can’t fix. We call this highly non-ergodic effect the `arrival of the frequent’.

Footprints of ancient balanced polymorphisms in genetic variation data

Footprints of ancient balanced polymorphisms in genetic variation data
Ziyue Gao, Molly Przeworski, Guy Sella
(Submitted on 29 Jan 2014)

When long-lived, balancing selection can lead to trans-species polymorphisms that are shared by two or more species identical by descent. In this case, the gene genealogies at the selected sites cluster by allele instead of by species and, because of linkage, nearby neutral sites also have unusual genealogies. Although it is clear that this scenario should lead to discernible footprints in genetic variation data, notably the presence of additional neutral polymorphisms shared between species and the absence of fixed differences, the effects remain poorly characterized. We focus on the case of a single site under long-lived balancing selection and derive approximations for summaries of the data that are sensitive to a trans-species polymorphism: the length of the segment that carries most of the signals, the expected number of shared neutral SNPs within the segment and the patterns of allelic associations among them. Coalescent simulations of ancient balancing selection confirm the accuracy of our approximations. We further show that for humans and chimpanzees, and more generally for pairs of species with low genetic diversity levels, the patterns of genetic variation on which we focus are highly unlikely to be generated by neutral recurrent mutations, so these statistics are specific as well as sensitive. We discuss the implications of our results for the design and interpretation of genome scans for ancient balancing selection in apes and other taxa.