Dualities in population genetics: a fresh look with new dualities


Dualities in population genetics: a fresh look with new dualities

Gioia Carinci, Cristian Giardina’, Claudio Giberti, Frank Redig
(Submitted on 13 Feb 2013)

We apply our general method of duality, introduced in [Giardina’, Kurchan, Redig, J. Math. Phys. 48, 033301 (2007)], to models of population dynamics. The classical dualities between forward and ancestral processes can be viewed as a change of representation in the classical creation and annihilation operators, both for diffusions dual to coalescents of Kingman’s type, as well as for models with finite population size. Next, using SU(1,1) raising and lowering operators, we find new dualities between the Wright-Fisher diffusion with $d$ types and the Moran model, both in presence and absence of mutations. These new dualities relates two forward evolutions. From our general scheme we also identify self-duality of the Moran model.

Population genetics of neutral mutations in exponentially growing cancer cell populations

Population genetics of neutral mutations in exponentially growing cancer cell populations
Rick Durrett
(Submitted on 12 Feb 2013)

In order to analyze data from cancer genome sequencing projects, we need to be able to distinguish causative, or “driver,” mutations from “passenger” mutations that have no selective effect. Toward this end, we prove results concerning the frequency of neutural mutations in exponentially growing multitype branching processes that have been widely used in cancer modeling. Our results yield a simple new population genetics result for the site frequency spectrum of a sample from an exponentially growing population.

Population Genetics of Rare Variants and Complex Diseases

Population Genetics of Rare Variants and Complex Diseases
M. Cyrus Maher, Lawrence H. Uricchio, Dara G. Torgerson, Ryan D. Hernandez
(Submitted on 12 Feb 2013)

Identifying drivers of complex traits from the noisy signals of genetic variation obtained from high throughput genome sequencing technologies is a central challenge faced by human geneticists today. We hypothesize that the variants involved in complex diseases are likely to exhibit non-neutral evolutionary signatures. Uncovering the evolutionary history of all variants is therefore of intrinsic interest for complex disease research. However, doing so necessitates the simultaneous elucidation of the targets of natural selection and population-specific demographic history. Here we characterize the action of natural selection operating across complex disease categories, and use population genetic simulations to evaluate the expected patterns of genetic variation in large samples. We focus on populations that have experienced historical bottlenecks followed by explosive growth (consistent with most human populations), and describe the differences between evolutionarily deleterious mutations and those that are neutral. Genes associated with several complex disease categories exhibit stronger signatures of purifying selection than non-disease genes. In addition, loci identified through genome-wide association studies of complex traits also exhibit signatures consistent with being in regions recurrently targeted by purifying selection. Through simulations, we show that population bottlenecks and rapid growth enables deleterious rare variants to persist at low frequencies just as long as neutral variants, but low frequency and common variants tend to be much younger than neutral variants. This has resulted in a large proportion of modern-day rare alleles that have a deleterious effect on function, and that potentially contribute to disease susceptibility.

Our Paper: Transcript length mediates developmental timing of gene expression across Drosophila.

This guest post is a commentary by Carlo Artieri on “Transcript length mediates developmental timing of gene expression across Drosophila” by Artieri, C.G. and H.B. Fraser. The preprint is arXived here.

We have recently posted a preprint manuscript to arXiv that tests a decades-old hypothesis about how biological aspects of development constraint gene structure using several genome-scale transcriptional timecourses and interpret its effects in the context of Drosophila evolution. The paper may be of particular interest to researchers using genomic data in evo-devo studies.

During the early stages of identification and characterization of homeobox
domain (HOX) genes and their related regulators, it was noted that they activated in a temporally sequential manner roughly correlated to their pre-mRNA transcript length (i.e., short genes express early, followed by longer genes.) This led to the hypothesis that this pattern was produced by a purely physical mechanism (Gubb 1986): genes with long pre-mRNAs cannot complete transcription in the interval between the rapid cell cycles taking place during early insect development, leading to abortive, non-functional transcripts. As long pre-mRNAs result primarily from long introns, this was termed ‘Intron Delay’.

We explored patterns of expression of genes in D. melanogaster over two embryonic timescales: eight time points spanning the latter part of the early embryonic ‘syncytial cycles’, during which the most rapid cell cycles take place, and 12 time points spanning the ~24 hours of embryogenesis. Long genes (≥ 5 kb long pre-mRNA transcripts) expressed from the zygotic genome showed a lag in the time required to reach stable levels of expression relative to short genes (< 5 kb) in both timecourses; in fact, stable expression of long genes did not occur until ~12 hours into embryogenesis, or midway between fertilization and emergence of larva from the egg. No such pattern was observed among long or short genes that are maternally deposited in the embryo, as is expected if inability to terminate transcription is the driving mechanism behind this delay. Additional embryonic timecourse data from RNA-Seq libraries generated from non poly-A selected total RNA, and therefore not biased towards capture of processed RNAs, showed that only long zygotic
genes expressed during the earliest developmental time points show a marked deficiency in 3’ relative to 5’ derived reads. This is consistent with their inability to terminate transcription, but not with transcriptional delay due to reduced transcriptional activation during early development.

The analysis was extended using developmental expression data from 3 additional Drosophila species spanning ~60 million years of evolution and showed that this pattern of delayed expression of long zygotically expressed genes is conserved across the phylogeny. This led us to predict that short zygotically expressed genes that are conserved in their ability to escape intron delay would be under substantial evolutionary pressure to maintain their compact lengths, and found that this was the case when compared to long zygotic or either short or long maternally deposited genes.

We suggest that intron delay is an underappreciated mechanism affecting the expression level of a substantial fraction of the Drosophila embryonic transcriptome (~10%) and acts as a source of significant constraint on the structural evolution of important developmental genes.

References:
Gubb D. 1986. Intron‐delay and the precision of expression of homoeotic gene products in Drosophila. Developmental Genetics 7: 119–131

Total internal and external lengths of the Bolthausen-Sznitman coalescent

Total internal and external lengths of the Bolthausen-Sznitman coalescent
Götz Kersting, Juan Carlos Pardo, Arno Siri-Jégousse
(Submitted on 6 Feb 2013)

In this paper, we study a weak law of large numbers for the total internal length of the Bolthausen-Szmitman coalescent. As a consequence, we obtain the weak limit law of the centered and rescaled total external length. The latter extends results obtained by Dhersin & M\”ohle \cite{DM12}. An application to population genetics dealing with the total number of mutations in the genealogical tree is also given.

Integrating genealogical and dynamical modelling to infer escape and reversion rates in HIV epitopes

Integrating genealogical and dynamical modelling to infer escape and reversion rates in HIV epitopes
Duncan Palmer, Angela McLean, Gil McVean
(Submitted on 5 Feb 2013)

The rates of escape and reversion in response to selection pressure arising from the host immune system, notably the cytotoxic T-lymphocyte (CTL) response, are key factors determining the evolution of HIV. Existing methods for estimating these parameters from cross-sectional population data using ordinary differential equations (ODE) ignore information about the genealogy of sampled HIV sequences, which has the potential to cause systematic bias and over-estimate certainty. Here, we describe an integrated approach, validated through extensive simulations, which combines genealogical inference and epidemiological modelling, to estimate rates of CTL escape and reversion in HIV epitopes. We show that there is substantial uncertainty about rates of viral escape and reversion from cross-sectional data, which arises from the inherent stochasticity in the evolutionary process. By application to empirical data, we find that point estimates of rates from a previously published ODE model and the integrated approach presented here are often similar, but can also differ several-fold depending on the structure of the genealogy. The model-based approach we apply provides a framework for the statistical analysis of escape and reversion in population data and highlights the need for longitudinal and denser cross-sectional sampling to enable accurate estimate of these key parameters.

Genetic draft, selective interference, and population genetics of rapid adaptation

Genetic draft, selective interference, and population genetics of rapid adaptation
Richard A. Neher
(Submitted on 5 Feb 2013)

To learn about the past from a sample of genomic sequences, one needs to understand how evolutionary processes shape genetic diversity. Most population genetic inference is based on frameworks assuming adaptive evolution is rare. But if positive selection operates on many loci simultaneously, as has recently been suggested for many species including animals such as flies, a different approach is necessary. In this review, I discuss recent progress in characterizing and understanding evolution in rapidly adapting populations where random associations of mutations with genetic backgrounds of different fitness, i.e., genetic draft, dominate over genetic drift. As a result, neutral genetic diversity depends weakly on population size, but strongly on the rate of adaptation or more generally the variance in fitness. Coalescent processes with multiple mergers, rather than Kingman’s coalescent, are appropriate genealogical models for rapidly adapting populations with important implications for population genetic inference.

Identifying Signatures of Selection in Genetic Time Series

Identifying Signatures of Selection in Genetic Time Series
Alison Feder, Sergey Kryazhimskiy, Joshua B. Plotkin
(Submitted on 3 Feb 2013)

We develop a rigorous test for natural selection based on allele frequencies sampled from a population over multiple time points. We demonstrate that the standard method of estimating selection coefficients in this setting, and the associated chi-squared likelihood-ratio test of neutrality, is biased and it therefore does not provide a reliable test of selection. We introduce two methods to correct this bias, and we demonstrate that the new methods have power to detect selection in practical parameter regimes, such as those encountered in fitness assays of microbial populations. Our analysis is limited to a single diallelic locus, assumed independent of all other loci in a genome, which is again relevant to simple competition assays of laboratory and natural isolates; other techniques will be required to detect selection in time series of co-segregating, linked loci.

Equitability Analysis of the Maximal Information Coefficient, with Comparisons

Equitability Analysis of the Maximal Information Coefficient, with Comparisons
David Reshef (1), Yakir Reshef (1), Michael Mitzenmacher (2), Pardis Sabeti (2) (1, 2 – contributed equally)
(Submitted on 27 Jan 2013)

A measure of dependence is said to be equitable if it gives similar scores to equally noisy relationships of different types. Equitability is important in data exploration when the goal is to identify a relatively small set of strongest associations within a dataset as opposed to finding as many non-zero associations as possible, which often are too many to sift through. Thus an equitable statistic, such as the maximal information coefficient (MIC), can be useful for analyzing high-dimensional data sets. Here, we explore both equitability and the properties of MIC, and discuss several aspects of the theory and practice of MIC. We begin by presenting an intuition behind the equitability of MIC through the exploration of the maximization and normalization steps in its definition. We then examine the speed and optimality of the approximation algorithm used to compute MIC, and suggest some directions for improving both. Finally, we demonstrate in a range of noise models and sample sizes that MIC is more equitable than natural alternatives, such as mutual information estimation and distance correlation.

The infinitely many genes model with horizontal gene transfer

The infinitely many genes model with horizontal gene transfer
Franz Baumdicker, Peter Pfaffelhuber
(Submitted on 28 Jan 2013)

The genome of bacterial species is much more flexible than that of eukaryotes. Moreover, the distributed genome hypothesis for bacteria states that the total number of genes present in a bacterial population is greater than the genome of every single individual. The pangenome, i.e. the set of all genes of a bacterial species (or a sample), comprises the core genes which are present in all living individuals, and accessory genes, which are carried only by some individuals. In order to use accessory genes for adaptation to environmental forces, genes can be transferred horizontally between individuals. Here, we extend the infinitely many genes model from Baumdicker, Hess and Pfaffelhuber (2010) for horizontal gene transfer. We take a genealogical view and give a construction — called the Ancestral Gene Transfer Graph — of the joint genealogy of all genes in the pangenome. As application, we compute moments of several statistics (e.g. the number of differences between two individuals and the gene frequency spectrum) under the infinitely many genes model with horizontal gene transfer.