The epigenome of evolving Drosophila neo-sex chromosomes: dosage compensation and heterochromatin formation

The epigenome of evolving Drosophila neo-sex chromosomes: dosage compensation and heterochromatin formation
Qi Zhou, Christopher E. Ellison, Vera B. Kaiser, Artyom A. Alekseyenko, Andrey A. Gorchakov, Doris Bachtrog
(Submitted on 26 Sep 2013)

Drosophila Y chromosomes are composed entirely of silent heterochromatin, while male X chromosomes have highly accessible chromatin and are hypertranscribed due to dosage compensation. Here, we dissect the molecular mechanisms and functional pressures driving heterochromatin formation and dosage compensation of the recently formed neo-sex chromosomes of Drosophila miranda. We show that the onset of heterochromatin formation on the neo-Y is triggered by an accumulation of repetitive DNA. The neo-X has evolved partial dosage compensation and we find that diverse mutational paths have been utilized to establish several dozen novel binding consensus motifs for the dosage compensation complex on the neo-X, including simple point mutations at pre-binding sites, insertion and deletion mutations, microsatellite expansions, or tandem amplification of weak binding sites. Spreading of these silencing or activating chromatin modifications to adjacent regions results in massive mis-expression of neo-sex linked genes, and little correspondence between functionality of genes and their silencing on the neo-Y or dosage compensation on the neo-X. Intriguingly, the genomic regions being targeted by the dosage compensation complex on the neo-X and those becoming heterochromatic on the neo-Y show little overlap, possibly reflecting different propensities along the ancestral chromosome to adopt active or repressive chromatin configurations. Our findings have broad implications for current models of sex chromosome evolution, and demonstrate how mechanistic constraints can limit evolutionary adaptations. Our study also highlights how evolution can follow predictable genetic trajectories, by repeatedly acquiring the same 21-bp consensus motif for recruitment of the dosage compensation complex, yet utilizing a diverse array of random mutational changes to attain the same phenotypic outcome.

A computational model for histone mark propagation reproduces the distribution of heterochromatin in different human cell types

A computational model for histone mark propagation reproduces the distribution of heterochromatin in different human cell types
Veit Schwämmle, Ole Nørregaard Jensen
(Submitted on 27 Sep 2013)

Chromatin is a highly compact and dynamic nuclear structure that consists of DNA and associated proteins. The main organizational unit is the nucleosome, which consists of a histone octamer with DNA wrapped around it. Histone proteins are implicated in the regulation of eukaryote genes and they carry numerous reversible post-translational modifications that control DNA-protein interactions and the recruitment of chromatin binding proteins. Heterochromatin, the transcriptionally inactive part of the genome, is densely packed and contains histone H3 that is methylated at Lys 9 (H3K9me). The propagation of H3K9me in nucleosomes along the DNA in chromatin is antagonizing by methylation of H3 Lysine 4 (H3K4me) and acetylations of several lysines, which is related to euchromatin and active genes. We show that the related histone modifications form antagonized domains on a coarse scale. These histone marks are assumed to be initiated within distinct nucleation sites in the DNA and to propagate bi-directionally. We propose a simple computer model that simulates the distribution of heterochromatin in human chromosomes. The simulations are in agreement with previously reported experimental observations from two different human cell lines. We reproduced different types of barriers between heterochromatin and euchromatin providing a unified model for their function. The effect of changes in the nucleation site distribution and of propagation rates were studied. The former occurs mainly with the aim of (de-)activation of single genes or gene groups and the latter has the power of controlling the transcriptional programs of entire chromosomes. Generally, the regulatory program of gene transcription is controlled by the distribution of nucleation sites along the DNA string.

Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants

Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants
Aridaman Pandit, Rob J de Boer
(Submitted on 26 Sep 2013)

Following transmission, HIV-1 evolves into a diverse population, and next generation sequencing enables us to detect variants occurring at low frequencies. Studying viral evolution at the level of whole genomes was hitherto not possible because next generation sequencing delivers relatively short reads. We here provide a proof of principle that whole HIV-1 genomes can be reliably reconstructed from short reads, and use this to study the selection of immune escape mutations at the level of whole genome haplotypes. Using realistically simulated HIV-1 populations, we demonstrate that reconstruction of complete genome haplotypes is feasible with high fidelity. We do not reconstruct all genetically distinct genomes, but each reconstructed haplotype represents one or more of the quasispecies in the HIV-1 population. We then reconstruct 30 whole genome haplotypes from published short sequence reads sampled longitudinally from a single HIV-1 infected patient. We confirm the reliability of the reconstruction by validating our predicted haplotype genes with single genome amplification sequences, and by comparing haplotype frequencies with observed epitope escape frequencies. Phylogenetic analysis shows that the HIV-1 population undergoes selection driven evolution, with successive replacement of the viral population by novel dominant strains. We demonstrate that immune escape mutants evolve in a dependent manner with various mutations hitchhiking along with others. As a consequence of this clonal interference, selection coefficients have to be estimated for complete haplotypes and not for individual immune escapes.

Neutral genomic regions refine models of recent rapid human population growth

Neutral genomic regions refine models of recent rapid human population growth
Elodie Gazave, Li Ma, Diana Chang, Alex Coventry, Feng Gao, Donna Muzny, Eric Boerwinkle, Richard Gibbs, Charles F. Sing, Andrew G. Clark, Alon Keinan
(Submitted on 25 Sep 2013)

Human populations have experienced dramatic growth since the Neolithic revolution. Recent studies that sequenced a very large number of individuals observed an extreme excess of rare variants, and provided clear evidence of recent rapid growth in effective population size, though estimates have varied greatly among studies. As medical applications drove the datasets therein, all studies were based on protein-coding genes, in which variants are also impacted by natural selection. In this study, we introduce targeted sequencing data for studying recent human history with minimal confounding by natural selection. We sequenced putatively neutral loci that are very far from genes and that meet a wide array of additional criteria. As population structure also skews allele frequencies, we sequenced a sample of relatively homogeneous ancestry by first analyzing the population structure of 9,716 European Americans. We employed very high coverage sequencing to reliably call rare variants, and fit an extensive array of models of recent European demographic history to the site frequency spectrum. The best-fit model estimates ~3.4% growth per generation during the last ~140 generations, resulting in a population size increase of two orders of magnitude. This model fits the data very well, largely due to our observation that assumptions of more ancient demography can impact estimates of recent growth. This observation and results also shed light on the discrepancy in demographic estimates among recent studies.

Some background on Bhaskar and Song’s paper: “The identifiability of piecewise demographic models from the sample frequency spectrum”

This post is by Graham Coop [@Graham_Coop], and is an introduction to the background of Anand Bhaskar and Yun Song preprint: “The identifiability of piecewise demographic models from the sample frequency spectrum”. arXived here.

Anand and Yun’s preprint focuses on what we can learn about population demographic history from the site frequency spectrum. It takes as its starting point an article by Myers et al (Myers, S., Fefferman, C., and Patterson, N. (2008) Can one learn history from the allelic spectrum? Theoretical Population Biology 73, 342–348.). I wrote about Myers et al’s article back in 2008, when it came out (see here). I thought I’d post an edited version of that post by way of additional background to Anand and Yun’s preprint and guest post.

Edited version of original post:
The best way to learn about demography from population genetic data is to look at patterns of diversity across many unlinked regions. The distribution of frequencies in a populations of unlinked neutral alleles at SNPs (the site frequency spectrum) is potentially very informative about population history. For example an excess of low frequency mutations is consistent with recent population growth, as the increase in population size introduces new mutations but these mutations have not yet had time to drift to higher frequencies. Many authors have made use of the frequency spectrum of unlinked, putatively neutral SNPs to learn about demography.

Back in 2008 a technical but elegant article by Myers et al shows that while informative about demography, the site frequency spectrum at unlinked SNPs can not help you chose between certain demographic histories. This is not a question of imperfect knowledge of the site frequency spectrum (which more data would solve) but because for any particular demographic model, as Myers et al formally show, there are a large family of demographic histories that can give rise to the same site frequency spectrum. They explained: ‘Informally, changes in population size at some past time are canceled out by other changes in the opposite direction’. I think that this lack of information comes from the fact that each unlinked SNP only tells you about the placement of a single mutation on the genealogy of the population at that site, and over sites you learn about the expected amount of time in different parts of the genealogy. By fluctuating the population size in just the right way, i.e. speeding up and slowing down the rate of coalescence, we can get very different population histories to give us the same expected coalescent times.

For example Myers et al showed that the population size history below, gives the same population frequency spectrum as a a constant population
1-s2.0-S0040580908000038-gr1
Figure taken from Myers, S., Fefferman, C., and Patterson, N. (2008) Can one learn history from the allelic spectrum? Theoretical Population Biology 73, 342–348.

That’s somewhat worrying as it says that when we fit a model of population size changes over time there are actually a family of quite different looking population histories that would give us just as good a fit to the data. However, these alternative histories do look quite strange and may not be biologically reasonable.

Anand and Yun’s article takes this as their starting place. They show that if we write our population history as a series of piecewise functions that the parameters of these functions are identifable, and provide simple estimates of the sample size needed. You can read more about their results in their guest post here at Haldane’s sieve.

Chaos and Unpredictability in Evolution

Chaos and Unpredictability in Evolution
Iaroslav Ispolatov, Michael Doebeli
(Submitted on 24 Sep 2013)

The possibility of complicated dynamic behaviour driven by non-linear feedbacks in dynamical systems has revolutionized science in the latter part of the last century. Yet despite examples of complicated frequency dynamics, the possibility of long-term evolutionary chaos is rarely considered. The concept of “survival of the fittest” is central to much evolutionary thinking and embodies a perspective of evolution as a directional optimization process exhibiting simple, predictable dynamics. This perspective is adequate for simple scenarios, when frequency-independent selection acts on scalar phenotypes. However, in most organisms many phenotypic properties combine in complicated ways to determine ecological interactions, and hence frequency-dependent selection. Therefore, it is natural to consider models for the evolutionary dynamics generated by frequency-dependent selection acting simultaneously on many different phenotypes. Here we show that complicated, chaotic dynamics of long-term evolutionary trajectories in phenotype space is very common in a large class of such models when the dimension of phenotype space is large, and when there are epistatic interactions between the phenotypic components. Our results suggest that the perspective of evolution as a process with simple, predictable dynamics covers only a small fragment of long-term evolution. Our analysis may also be the first systematic study of the occurrence of chaos in multidimensional and generally dissipative systems as a function of the dimensionality of phase space.

Fast Inference of Admixture Coefficients Using Sparse Non-negative Matrix Factorization Algorithms

Fast Inference of Admixture Coefficients Using Sparse Non-negative Matrix Factorization Algorithms
Eric Frichot, François Mathieu, Théo Trouillon, Guillaume Bouchard, Olivier François
(Submitted on 24 Sep 2013)

Inference of individual admixture coefficients, which is important for population genetic and association studies, is commonly performed using compute-intensive likelihood algorithms. With the availability of large population genomic data sets, fast versions of likelihood algorithms have attracted considerable attention. Reducing the computational burden of estimation algorithms remains, however, a major challenge. Here, we present a fast and efficient method for estimating individual admixture coefficients based on sparse non-negative matrix factorization algorithms. We implemented our method in the computer program sNMF, and applied it to human and plant genomic data sets. The performances of sNMF were then compared to the likelihood algorithm implemented in the computer program ADMIXTURE. Without loss of accuracy, sNMF computed estimates of admixture coefficients within run-times approximately 10 to 30 times faster than those of ADMIXTURE.