Populations in statistical genetic modelling and inference

Populations in statistical genetic modelling and inference

Daniel John Lawson
(Submitted on 4 Jun 2013)

What is a population? This review considers how a population may be defined in terms of understanding the structure of the underlying genetics of the individuals involved. The main approach is to consider statistically identifiable groups of randomly mating individuals, which is well defined in theory for any type of (sexual) organism. We discuss generative models using drift, admixture and spatial structure, and the ancestral recombination graph. These are contrasted with statistical models for inference, principle component analysis and other `non-parametric’ methods. The relationships between these approaches are explored with both simulated and real-data examples. The state-of-the-art practical software tools are discussed and contrasted. We conclude that populations are a useful theoretical construct that can be well defined in theory and often approximately exist in practice.

The Dynamics of Genetic Draft in Rapidly Adapting Populations

The Dynamics of Genetic Draft in Rapidly Adapting Populations
Katya Kosheleva, Michael Desai
(Submitted on 30 May 2013)

The accumulation of beneficial mutations on many competing genetic backgrounds in rapidly adapting populations has a striking impact on evolutionary dynamics. This effect, known as clonal interference, causes erratic fluctuations in the frequencies of observed mutations, randomizes the fixation times of successful mutations, and leaves distinct signatures on patterns of genetic variation. Here, we show how this form of `genetic draft’ affects the forward-time dynamics of site frequencies in rapidly adapting asexual populations. We calculate the probability that mutations at individual sites shift in frequency over a characteristic timescale, extending Gillespie’s original model of draft to the case where many strongly selected beneficial mutations segregate simultaneously. We then derive the sojourn time of mutant alleles, the expected fixation time of successful mutants, and the site frequency spectrum of beneficial and neutral mutations. We show how this form of draft affects inferences in the McDonald-Kreitman test, and how it relates to recent observations that some aspects of genetic diversity are described by the Bolthausen-Sznitman coalescent in the limit of very rapid adaptation. Finally, we describe how our method can be extended to model evolution on fitness landscapes that include some forms of epistasis, such as landscapes that are partitioned into two or more incompatible evolutionary trajectories.

The common ancestor process revisited

The common ancestor process revisited
Sandra Kluth, Thiemo Hustedt, Ellen Baake
(Submitted on 25 May 2013)

We consider the Moran model in continuous time with two types, mutation, and selection. We concentrate on the ancestral line and its stationary type distribution. Building on work by Fearnhead (J. Appl. Prob. 39 (2002), 38-54) and Taylor (Electron. J. Probab. 12 (2007), 808-847), we characterise this distribution via the fixation probability of the offspring of all individuals of favourable type (regardless of the offsprings’ types). We concentrate on a finite population and stay with the resulting discrete setting all the way through. This way, we extend previous results and gain new insight into the underlying particle picture.

Statistical properties of the site-frequency spectrum associated with Lambda-coalescents

Statistical properties of the site-frequency spectrum associated with Lambda-coalescents
Matthias Birkner, Jochen Blath, Bjarki Eldon
(Submitted on 26 May 2013)

Statistical properties of the site frequency spectrum associated with Lambda-coalescents are our objects of study. In particular, we derive recursions for the expected value, variance, and covariance of the spectrum, extending earlier results of Fu (1995) for the classical Kingman coalescent. Our focus is on estimating coalescent parameters introduced by certain Lambda-coalescents for datasets to large for full likelihood methods. The recursions for the expected values we obtain can be used to find the parameter values which give the best fit to the observed frequency spectrum. The expected values are also used to approximate the probability a (derived) mutation arises on a branch subtending a given number of leaves (DNA sequences), allowing us to apply a pseudo-likelihood inference to estimate coalescence parameters associated with certain subclasses of Lambda coalescents. The properties of the pseudo-likelihood approach are investigated on real and simulated datasets. Our results for two subclasses of Lambda coalescents show that one can distinguish these subclasses from the Kingman coalescent, as well as between the Lambda-subclasses. In addition, our results yield further support for multiple merger coalescents as an appropriate `null’ model at the mitochondrial DNA level for high-fecundity Atlantic cod (\emph{Gadus morhua}).

The deleterious mutation load is insensitive to recent population history

The deleterious mutation load is insensitive to recent population history
Yuval B. Simons, Michael C. Turchin, Jonathan K. Pritchard, Guy Sella
(Submitted on 9 May 2013)

Human populations have undergone dramatic changes in population size in the past 100,000 years, including a severe bottleneck of non-African populations and recent explosive population growth. There is currently great interest in how these demographic events may have affected the burden of deleterious mutations in individuals and the allele frequency spectrum of disease mutations in populations. Here we use population genetic models to show that–contrary to previous conjectures–recent human demography has likely had very little impact on the average burden of deleterious mutations carried by individuals. This prediction is supported by exome sequence data showing that African American and European American individuals carry very similar burdens of damaging mutations. We next consider whether recent population growth has increased the importance of very rare mutations in complex traits. Our analysis predicts that for most classes of disease variants, rare alleles are unlikely to contribute a large fraction of the total genetic variance, and that the impact of recent growth is likely to be modest. However, for diseases that have a direct impact on fitness, strongly deleterious rare mutations likely do play important roles, and the impact of very rare mutations will be far greater as a result of recent growth. In summary, demographic history has dramatically impacted patterns of variation in different human populations, but these changes have likely had little impact on either genetic load or on the importance of rare variants for most complex traits.

Statistical Physics of Evolutionary Trajectories on Fitness Landscapes

Statistical Physics of Evolutionary Trajectories on Fitness Landscapes
Michael Manhart, Alexandre V. Morozov
(Submitted on 6 May 2013)

Random walks on multidimensional nonlinear landscapes are of interest in many areas of science and engineering. In particular, properties of adaptive trajectories on fitness landscapes determine population fates and thus play a central role in evolutionary theory. The topography of fitness landscapes and its effect on evolutionary dynamics have been extensively studied in the literature. We will survey the current research knowledge in this field, focusing on a recently developed systematic approach to characterizing path lengths, mean first-passage times, and other statistics of the path ensemble. This approach, based on general techniques from statistical physics, is applicable to landscapes of arbitrary complexity and structure. It is especially well-suited to quantifying the diversity of stochastic trajectories and repeatability of evolutionary events. We demonstrate this methodology using a biophysical model of protein evolution that describes how proteins maintain stability while evolving new functions.

Critical case stochastic phylogenetic tree model via the Laplace transform

Critical case stochastic phylogenetic tree model via the Laplace transform
Krzysztof Bartoszek, Michal Krzeminski
(Submitted on 30 Apr 2013)

Birth-and-death models are now a common mathematical tool to describe branching patterns observed in real-world phylogenetic trees. Liggett and Schinazi (2009) is one such example. The authors propose a simple birth-and-death model that is compatible with phylogenetic trees of both influenza and HIV, depending on the birth rate parameter. An interesting special case of this model is the critical case where the birth rate equals the death rate. This is a non-trivial situation and to study its asymptotic behaviour we employed the Laplace transform. With this we correct the proof of Liggett and Schinazi (2009) in the critical case.

The Expected Linkage Disequilibrium in Finite Populations Revisited

The Expected Linkage Disequilibrium in Finite Populations Revisited
Ulrike Ober, Alexander Malinowski, Martin Schlather, Henner Simianer
(Submitted on 17 Apr 2013)

The expected level of linkage disequilibrium (LD) in a finite ideal population at equilibrium is of relevance for many applications in population and quantitative genetics. Several recursion formulae have been proposed during the last decades, whose derivations mostly contain heuristic parts and therefore remain mathematically questionable. We propose a more justifiable approach, including an alternative recursion formula for the expected LD. Since the exact formula depends on the distribution of allele frequencies in a very complicated manner, we suggest an approximate solution and analyze its validity extensively in a simulation study. Compared to the widely used formula of Sved, the proposed formula performs better for all parameter constellations considered. We then analyze the expected LD at equilibrium using the theory on discrete-time Markov chains based on the linear recursion formula, with equilibrium being defined as the steady-state of the chain, which finally leads to a formula for the effective population size N_e. An additional analysis considers the effect of non-exactness of a recursion formula on the steady-state, demonstrating that the resulting error in expected LD can be substantial. In an application to the HapMap data of two human populations we illustrate the dependency of the N_e-estimate on the distribution of minor allele frequencies (MAFs), showing that the estimated N_e can vary by up to 30% when a uniform instead of a skewed distribution of MAFs is taken as a basis to select SNPs for the analyses. Our analyses provide new insights into the mathematical complexity of the problem studied.

Identifiability of a Coalescent-based Population Tree Model

Identifiability of a Coalescent-based Population Tree Model
Arindam RoyChoudhury
(Submitted on 12 Apr 2013)

Identifiability of evolutionary tree models has been a recent topic of discussion and some models have been shown to be non-identifiable. A coalescent-based rooted population tree model, originally proposed by Nielsen et al. 1998 [2], has been used by many authors in the last few years and is a simple tool to accurately model the changes in allele frequencies in the tree. However, the identifiability of this model has never been proven. Here we prove this model to be identifiable by showing that the model parameters can be expressed as functions of the probability distributions of subsamples. This a step toward proving the consistency of the maximum likelihood estimator of the population tree based on this model.

The Maintenance of Sex: Ronald Fisher meets the Red Queen

The Maintenance of Sex: Ronald Fisher meets the Red Queen
David Green, Chris Mason
(Submitted on 10 Apr 2013)

Sex in higher diploids carries a two-fold cost of males that should reduce its fitness relative to cloning and result in extinction. Instead, sex is widespread and it is clonal species that face early obsolescence. One possible reason is that sex is an adaptation to resist parasites. We use computer simulations of finite populations to model a Red Queen in which a parasitic haploid mounts a negative frequency-dependent attack on a diploid host. Both host and parasite populations generate novel alleles by mutation and have access to large allele spaces. Sex outcompetes cloning by two overlapping mechanisms. First, sexual diploids adopt advantageous homozygous mutations more rapidly than clonal diploids under conditions of lag load. This rate advantage can offset the lesser fecundity of sex. Second, a relative advantage to sex emerges under host mutation rates that are fast enough to retain fitness in a rapidly mutating parasite environment and increase host polymorphism and polyclonality. Polyclonal populations disproportionately experience interference with selection at high mutation rates, both between and within loci, slowing clonal population adaptation to a changing parasite environment and reducing clonal population fitness relative to sex. This effect increases markedly with the number of loci under independent selection. Rates of parasite mutation exist that not only allow sex to survive despite the two-fold cost of males but which enable sexual and clonal populations to have equal fitness and co-exist. Since all higher organisms carry parasitic loads, the model is of general applicability.