An experimentally determined evolutionary model dramatically improves phylogenetic fit

An experimentally determined evolutionary model dramatically improves phylogenetic fit
Jesse D Bloom

All modern approaches to molecular phylogenetics require a quantitative model for how genes evolve. Unfortunately, existing evolutionary models do not realistically represent the site-heterogeneous selection that governs actual sequence change. Attempts to remedy this problem have involved augmenting these models with a burgeoning number of free parameters. Here I demonstrate an alternative: experimental determination of a parameter-free evolutionary model via mutagenesis, functional selection, and deep sequencing. Using this strategy, I create an evolutionary model for influenza nucleoprotein that describes the gene phylogeny far better than existing models with dozens or even hundreds of free parameters. High-throughput experimental strategies such as the one employed here provide fundamentally new information that has the potential to transform the sensitivity of phylogenetic analyses.

motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences

motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences
Dennis Kostka, Tara Friedrich, Alisha K. Holloway, Katherine S. Pollard
(Submitted on 1 Feb 2014)

Next-generation sequencing technology enables the identification of thousands of gene regulatory sequences in many cell types and organisms. We consider the problem of testing if two such sequences differ in their number of binding site motifs for a given transcription factor (TF) protein. Binding site motifs impart regulatory function by providing TFs the opportunity to bind to genomic elements and thereby affect the expression of nearby genes. Evolutionary changes to such functional DNA are hypothesized to be major contributors to phenotypic diversity within and between species; but despite the importance of TF motifs for gene expression, no method exists to test for motif loss or gain. Assuming that motif counts are Binomially distributed, and allowing for dependencies between motif instances in evolutionarily related sequences, we derive the probability mass function of the difference in motif counts between two nucleotide sequences. We provide a method to numerically estimate this distribution from genomic data and show through simulations that our estimator is accurate. Finally, we introduce the R package {\tt motifDiverge} that implements our methodology and illustrate its application to gene regulatory enhancers identified by a mouse developmental time course experiment. While this study was motivated by analysis of regulatory motifs, our results can be applied to any problem involving two correlated Bernoulli trials.

Evolution at two levels of gene expression in yeast

Evolution at two levels of gene expression in yeast
Carlo G. Artieri, Hunter B. Fraser
(Submitted on 27 Nov 2013)

Despite the greater functional importance of protein levels, our knowledge of gene expression evolution is based almost entirely on studies of mRNA levels. In contrast, our understanding of how translational regulation evolves has lagged far behind. Here we have applied ribosome profiling – which measures both global mRNA levels and their translation rates – to two species of Saccharomyces yeast and their interspecific hybrid in order to assess the relative contributions of changes in mRNA abundance and translation to regulatory evolution. We report that both cis and trans-acting regulatory divergence in translation are abundant, affecting at least 35% of genes. The majority of translational divergence acts to buffer changes in mRNA abundance, suggesting a widespread role for stabilizing selection acting across regulatory levels. Nevertheless, we observe evidence of lineage-specific selection acting on a number of yeast functional modules, including instances of reinforcing selection acting at both levels of regulation. Finally, we also uncover multiple instances of stop-codon readthrough that are conserved between species. Our analysis reveals the under-appreciated complexity of post-transcriptional regulatory divergence and indicates that partitioning the search for the locus of selection into the binary categories of ‘coding’ vs. ‘regulatory’ may overlook a significant source of selection, acting at multiple regulatory levels along the path from genotype to phenotype.

Patterns of positive selection in seven ant genomes

Patterns of positive selection in seven ant genomes

Julien Roux, Eyal Privman, Sebastien Moretti, Josephine T. Daub, Marc Robinson-Rechavi, Laurent Keller
(Submitted on 19 Nov 2013)

The evolution of ant species is marked by remarkable adaptations that allowed the development of very complex social systems. To identify how ant-specific adaptations are associated with specific patterns of molecular evolution we searched for signs of positive selection on amino-acid changes in proteins during the evolution of the ant lineage. We identified 24 functional categories of genes which were enriched for positively selected genes in the ant lineage. We also reanalyzed genome-wide dataset in bees and flies with the same methodology to check if genes under positive selection in ants were also under positive selection in the other analyzed lineages. Notably, genes implicated in immunity were enriched for positively selected genes in the three lineages, ruling out the hypothesis that the evolution of hygienic behaviors in social insects caused a major relaxation of selective pressure on this set of genes. Our scan also indicated that genes implicated in neurogenesis and olfaction started to undergo increased positive selection before the evolution of sociality in Hymenoptera, although it is assumed that the main challenges of the olfactory and neural systems in this lineage occurred with the evolution of social living. Finally, the comparison between these three lineages allowed us to pinpoint molecular evolution patterns that were specific to the ant lineage. In particular, there was relaxed selective pressure for genes related to metabolism in ants but not in bees and flies, possibly reflecting the loss of flight in ant workers. By contrast, there was recurrent positive selection on genes with mitochondrial functions specifically in ants, suggesting that the activity of mitochondria was improved during ant evolution. This might have been an important step toward the evolution of extreme lifespan that is a hallmark of this lineage.

A stochastic microscopic model for the dynamics of antigenic variation

A stochastic microscopic model for the dynamics of antigenic variation

Gustavo Guerberoff, Fernando Alvarez-Valin
(Submitted on 8 Nov 2013)

We present a novel model that describes the within-host evolutionary dynamics of parasites undergoing antigenic variation. The approach uses a multi-type branching process with two types of entities defined according to their relationship with the immune system: clans of resistant parasitic cells (i.e. groups of cells sharing the same antigen not yet recognized by the immune system) that may become sensitive, and individual sensitive cells that can acquire a new resistance thus giving rise to the emergence of a new clan. The simplicity of the model allows analytical treatment to determine the subcritical and supercritical regimes in the space of parameters. By incorporating a density-dependent mechanism the model is able to capture additional relevant features observed in experimental data, such as the characteristic parasitemia waves. In summary our approach provides a new general framework to address the dynamics of antigenic variation which can be easily adapted to cope with broader and more complex situations.

Mapping of the Influenza-A Hemagglutinin Serotypes Evolution by the ISSCOR Method

Mapping of the Influenza-A Hemagglutinin Serotypes Evolution by the ISSCOR Method
Jan P. Radomski, Piotr P. Slonimski, Włodzimierz Zagórski-Ostoja, Piotr Borowicz
(Submitted on 8 Nov 2013)

Analyses and visualizations by the ISSCOR method of influenza virus hemagglutinin genes of different A-subtypes revealed some rather striking temporal relationships between groups of individual gene subsets. Based on these findings we consider application of the ISSCOR-PCA method for analyses of large sets of homologous genes to be a worthwhile addition to a toolbox of genomics – allowing for a rapid diagnostics of trends, and ultimately even aiding an early warning of newly emerging epidemiological threats.

The hemagglutinin mutation E391K of pandemic 2009 influenza revisited

The hemagglutinin mutation E391K of pandemic 2009 influenza revisited
Jan P. Radomski, Piotr Płoński, Włodzimierz Zagórski-Ostoja
(Submitted on 8 Nov 2013)

Phylogenetic analyses based on small to moderately sized sets of sequential data lead to overestimating mutation rates in influenza hemagglutinin (HA) by at least an order of magnitude. Two major underlying reasons are: the incomplete lineage sorting, and a possible absence in the analyzed sequences set some of key missing ancestors. Additionally, during neighbor joining tree reconstruction each mutation is considered equally important, regardless of its nature. Here we have implemented a heuristic method optimizing site dependent factors weighting differently 1st, 2nd, and 3rd codon position mutations, allowing to extricate incorrectly attributed sub-clades. The least squares regression analysis of distribution of frequencies for all mutations observed on a partially disentangled tree for a large set of unique 3243 HA sequences, along all nucleotide positions, was performed for all mutations as well as for non-equivalent amino acid mutations: in both cases demonstrating almost flat gradients, with a very slight downward slope towards the 3′-end positions. The mean mutation rates per sequence per year were 3.83*10^-4 for the all mutations, and 9.64*10^-5 for the non-equivalent ones.

Can we predict the mutation rate at the single nucleotide scale in the human genome?

Can we predict the mutation rate at the single nucleotide scale in the human genome?
Adam Eyre-Walker, Ying Chen
(Submitted on 29 Oct 2013)

It has been recently claimed that it is possible to predict the rate of de novo mutation of each site in the human genome with almost perfect accuracy (Michaelson et al. (2012) Cell, 151, 1431-1442). We show that this claim is unwarranted. By considering the correlation between the rate of de novo mutation and the predictions from the model of Michaelson et al., we show that there could be substantial unexplained variance in the mutation rate. We also demonstrate that the model of Michaelson et al. fails to capture a major component of the variation in the mutation rate, that which is local but not associated with simple context.

The epigenome of evolving Drosophila neo-sex chromosomes: dosage compensation and heterochromatin formation

The epigenome of evolving Drosophila neo-sex chromosomes: dosage compensation and heterochromatin formation
Qi Zhou, Christopher E. Ellison, Vera B. Kaiser, Artyom A. Alekseyenko, Andrey A. Gorchakov, Doris Bachtrog
(Submitted on 26 Sep 2013)

Drosophila Y chromosomes are composed entirely of silent heterochromatin, while male X chromosomes have highly accessible chromatin and are hypertranscribed due to dosage compensation. Here, we dissect the molecular mechanisms and functional pressures driving heterochromatin formation and dosage compensation of the recently formed neo-sex chromosomes of Drosophila miranda. We show that the onset of heterochromatin formation on the neo-Y is triggered by an accumulation of repetitive DNA. The neo-X has evolved partial dosage compensation and we find that diverse mutational paths have been utilized to establish several dozen novel binding consensus motifs for the dosage compensation complex on the neo-X, including simple point mutations at pre-binding sites, insertion and deletion mutations, microsatellite expansions, or tandem amplification of weak binding sites. Spreading of these silencing or activating chromatin modifications to adjacent regions results in massive mis-expression of neo-sex linked genes, and little correspondence between functionality of genes and their silencing on the neo-Y or dosage compensation on the neo-X. Intriguingly, the genomic regions being targeted by the dosage compensation complex on the neo-X and those becoming heterochromatic on the neo-Y show little overlap, possibly reflecting different propensities along the ancestral chromosome to adopt active or repressive chromatin configurations. Our findings have broad implications for current models of sex chromosome evolution, and demonstrate how mechanistic constraints can limit evolutionary adaptations. Our study also highlights how evolution can follow predictable genetic trajectories, by repeatedly acquiring the same 21-bp consensus motif for recruitment of the dosage compensation complex, yet utilizing a diverse array of random mutational changes to attain the same phenotypic outcome.

Predicting protein contact map using evolutionary and physical constraints by integer programming

Predicting protein contact map using evolutionary and physical constraints by integer programming
Zhiyong Wang, Jinbo Xu
(Submitted on 8 Aug 2013)

Motivation. Protein contact map describes the pairwise spatial and functional relationship of residues in a protein and contains key information for protein 3D structure prediction. Although studied extensively, it remains very challenging to predict contact map using only sequence information. Most existing methods predict the contact map matrix element-by-element, ignoring correlation among contacts and physical feasibility of the whole contact map. A couple of recent methods predict contact map based upon residue co-evolution, taking into consideration contact correlation and enforcing a sparsity restraint, but these methods require a very large number of sequence homologs for the protein under consideration and the resultant contact map may be still physically unfavorable.
Results. This paper presents a novel method PhyCMAP for contact map prediction, integrating both evolutionary and physical restraints by machine learning and integer linear programming (ILP). The evolutionary restraints include sequence profile, residue co-evolution and context-specific statistical potential. The physical restraints specify more concrete relationship among contacts than the sparsity restraint. As such, our method greatly reduces the solution space of the contact map matrix and thus, significantly improves prediction accuracy. Experimental results confirm that PhyCMAP outperforms currently popular methods no matter how many sequence homologs are available for the protein under consideration. PhyCMAP can predict contacts within minutes after PSIBLAST search for sequence homologs is done, much faster than the two recent methods PSICOV and EvFold.