Total internal and external lengths of the Bolthausen-Sznitman coalescent

Total internal and external lengths of the Bolthausen-Sznitman coalescent
Götz Kersting, Juan Carlos Pardo, Arno Siri-Jégousse
(Submitted on 6 Feb 2013)

In this paper, we study a weak law of large numbers for the total internal length of the Bolthausen-Szmitman coalescent. As a consequence, we obtain the weak limit law of the centered and rescaled total external length. The latter extends results obtained by Dhersin & M\”ohle \cite{DM12}. An application to population genetics dealing with the total number of mutations in the genealogical tree is also given.

Integrating genealogical and dynamical modelling to infer escape and reversion rates in HIV epitopes

Integrating genealogical and dynamical modelling to infer escape and reversion rates in HIV epitopes
Duncan Palmer, Angela McLean, Gil McVean
(Submitted on 5 Feb 2013)

The rates of escape and reversion in response to selection pressure arising from the host immune system, notably the cytotoxic T-lymphocyte (CTL) response, are key factors determining the evolution of HIV. Existing methods for estimating these parameters from cross-sectional population data using ordinary differential equations (ODE) ignore information about the genealogy of sampled HIV sequences, which has the potential to cause systematic bias and over-estimate certainty. Here, we describe an integrated approach, validated through extensive simulations, which combines genealogical inference and epidemiological modelling, to estimate rates of CTL escape and reversion in HIV epitopes. We show that there is substantial uncertainty about rates of viral escape and reversion from cross-sectional data, which arises from the inherent stochasticity in the evolutionary process. By application to empirical data, we find that point estimates of rates from a previously published ODE model and the integrated approach presented here are often similar, but can also differ several-fold depending on the structure of the genealogy. The model-based approach we apply provides a framework for the statistical analysis of escape and reversion in population data and highlights the need for longitudinal and denser cross-sectional sampling to enable accurate estimate of these key parameters.

Genetic draft, selective interference, and population genetics of rapid adaptation

Genetic draft, selective interference, and population genetics of rapid adaptation
Richard A. Neher
(Submitted on 5 Feb 2013)

To learn about the past from a sample of genomic sequences, one needs to understand how evolutionary processes shape genetic diversity. Most population genetic inference is based on frameworks assuming adaptive evolution is rare. But if positive selection operates on many loci simultaneously, as has recently been suggested for many species including animals such as flies, a different approach is necessary. In this review, I discuss recent progress in characterizing and understanding evolution in rapidly adapting populations where random associations of mutations with genetic backgrounds of different fitness, i.e., genetic draft, dominate over genetic drift. As a result, neutral genetic diversity depends weakly on population size, but strongly on the rate of adaptation or more generally the variance in fitness. Coalescent processes with multiple mergers, rather than Kingman’s coalescent, are appropriate genealogical models for rapidly adapting populations with important implications for population genetic inference.

Identifying Signatures of Selection in Genetic Time Series

Identifying Signatures of Selection in Genetic Time Series
Alison Feder, Sergey Kryazhimskiy, Joshua B. Plotkin
(Submitted on 3 Feb 2013)

We develop a rigorous test for natural selection based on allele frequencies sampled from a population over multiple time points. We demonstrate that the standard method of estimating selection coefficients in this setting, and the associated chi-squared likelihood-ratio test of neutrality, is biased and it therefore does not provide a reliable test of selection. We introduce two methods to correct this bias, and we demonstrate that the new methods have power to detect selection in practical parameter regimes, such as those encountered in fitness assays of microbial populations. Our analysis is limited to a single diallelic locus, assumed independent of all other loci in a genome, which is again relevant to simple competition assays of laboratory and natural isolates; other techniques will be required to detect selection in time series of co-segregating, linked loci.

The infinitely many genes model with horizontal gene transfer

The infinitely many genes model with horizontal gene transfer
Franz Baumdicker, Peter Pfaffelhuber
(Submitted on 28 Jan 2013)

The genome of bacterial species is much more flexible than that of eukaryotes. Moreover, the distributed genome hypothesis for bacteria states that the total number of genes present in a bacterial population is greater than the genome of every single individual. The pangenome, i.e. the set of all genes of a bacterial species (or a sample), comprises the core genes which are present in all living individuals, and accessory genes, which are carried only by some individuals. In order to use accessory genes for adaptation to environmental forces, genes can be transferred horizontally between individuals. Here, we extend the infinitely many genes model from Baumdicker, Hess and Pfaffelhuber (2010) for horizontal gene transfer. We take a genealogical view and give a construction — called the Ancestral Gene Transfer Graph — of the joint genealogy of all genes in the pangenome. As application, we compute moments of several statistics (e.g. the number of differences between two individuals and the gene frequency spectrum) under the infinitely many genes model with horizontal gene transfer.

Natural selection. VI. Partitioning the information in fitness and characters by path analysis

Natural selection. VI. Partitioning the information in fitness and characters by path analysis
Steven A. Frank
(Submitted on 22 Jan 2013)

Three steps aid in the analysis of selection. First, describe phenotypes by their component causes. Components include genes, maternal effects, symbionts, and any other predictors of phenotype that are of interest. Second, describe fitness by its component causes, such as an individual’s phenotype, its neighbors’ phenotypes, resource availability, and so on. Third, put the predictors of phenotype and fitness into an exact equation for evolutionary change, providing a complete expression of selection and other evolutionary processes. The complete expression separates the distinct causal roles of the various hypothesized components of phenotypes and fitness. Traditionally, those components are given by the covariance, variance, and regression terms of evolutionary models. I show how to interpret those statistical expressions with respect to information theory. The resulting interpretation allows one to read the fundamental equations of selection and evolution as sentences that express how various causes lead to the accumulation of information by selection and the decay of information by other evolutionary processes. The interpretation in terms of information leads to a deeper understanding of selection and heritability, and a clearer sense of how to formulate causal hypotheses about evolutionary process. Kin selection appears as a particular type of causal analysis that partitions social effects into meaningful components.

Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations

Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations
Katarzyna Bryc, Wlodek Bryc, Jack W. Silverstein
(Submitted on 18 Jan 2013)

We present a mathematical model, and the corresponding mathematical analysis, that justifies and quantifies the use of principal component analysis of biallelic genetic marker data for a set of individuals to detect the number of subpopulations represented in the data. We indicate that the power of the technique relies more on the number of individuals genotyped than on the number of markers.

Evolution of molecular phenotypes under stabilizing selection

Evolution of molecular phenotypes under stabilizing selection
Armita Nourmohammad, Stephan Schiffels, Michael Laessig
(Submitted on 17 Jan 2013)

Molecular phenotypes are important links between genomic information and organismic functions, fitness, and evolution. Complex phenotypes, which are also called quantitative traits, often depend on multiple genomic loci. Their evolution builds on genome evolution in a complicated way, which involves selection, genetic drift, mutations and recombination. Here we develop a coarse-grained evolutionary statistics for phenotypes, which decouples from details of the underlying genotypes. We derive approximate evolution equations for the distribution of phenotype values within and across populations. This dynamics covers evolutionary processes at high and low recombination rates, that is, it applies to sexual and asexual populations. In a fitness landscape with a single optimal phenotype value, the phenotypic diversity within populations and the divergence between populations reach evolutionary equilibria, which describe stabilizing selection. We compute the equilibrium distributions of both quantities analytically and we show that the ratio of mean divergence and diversity depends on the strength of selection in a universal way: it is largely independent of the phenotype’s genomic encoding and of the recombination rate. This establishes a new method for the inference of selection on molecular phenotypes beyond the genome level. We discuss the implications of our findings for the predictability of evolutionary processes.

Dynamics of adaptation: extreme value domains, distance to fitness optimum and fitness correlations

Dynamics of adaptation: extreme value domains, distance to fitness optimum and fitness correlations
Sarada Seetharaman, Kavita Jain
(Submitted on 8 Jan 2013)

We study the properties of adaptive walk performed by a maladapted asexual population in which beneficial mutations fix sequentially until a local fitness peak is reached. Here we consider three factors that govern the adaptation dynamics: the extreme value domain of beneficial mutations, initial distance to the local fitness optimum and the correlations amongst the fitnesses. We show that there is a transition in the behaviour of the walk length and average fitness fixed during adaptation when the mean and variance of the fitness distribution respectively become infinite. When the mean is finite, walk length decreases logarithmically with initial fitness but is a constant otherwise. We also find that the walks are longer for faster decaying fitness distributions and correlated fitnesses. For fitness distributions with finite variance, the fitness fixed during initial steps does not depend on the fitness of the local optimum but increases with the local peak fitness otherwise. Interestingly, the fitness difference between successive steps shows a pattern of diminishing returns for bounded distributions and accelerating returns for fat-tailed distributions. These trends are found to be robust with respect to fitness correlations.

Selection biases the prevalence and type of epistasis along adaptive trajectories

Selection biases the prevalence and type of epistasis along adaptive trajectories
Jeremy A. Draghi, Joshua B. Plotkin
(Submitted on 17 Dec 2012)

The contribution to an organism’s phenotype from one genetic locus may depend upon the status of other loci. Such epistatic interactions among loci are now recognized as fundamental to shaping the process of adaptation in evolving populations. Although little is known about the structure of epistasis in most organisms, recent experiments with bacterial populations have concluded that antagonistic interactions abound and tend to de-accelerate the pace of adaptation over time. Here, we use a broad class of mathematical fitness landscapes to examine how natural selection biases the mutations that substitute during evolution based on their epistatic interactions. We find that, even when beneficial mutations are rare, these biases are strong and change substantially throughout the course of adaptation. In particular, epistasis is less prevalent than the neutral expectation early in adaptation and much more prevalent later, with a concomitant shift from predominantly antagonistic interactions early in adaptation to synergistic and sign epistasis later in adaptation. We observe the same patterns when re-analyzing data from a recent microbial evolution experiment. Since these biases depend on the population size and other parameters, they must be quantified before we can hope to use experimental data to infer an organism’s underlying fitness landscape or to understand the role of epistasis in shaping its adaptation. In particular, we show that when the order of substitutions is not known to an experimentalist, then standard methods of analysis may suggest that epistasis retards adaptation when in fact it accelerates it.