Changes in epistatic interactions in the long-term evolution of HIV-1 protease

Changes in epistatic interactions in the long-term evolution of HIV-1 protease

Aditi Gupta, Christoph Adami
(Submitted on 12 Aug 2014)

The human immuno-deficiency virus sub-type 1 (HIV-1) is evolving to keep up with a changing fitness landscape, due to the various drugs introduced to stop the virus’s replication. As the virus adapts, the information the virus encodes about its environment must change, and this change is reflected in the amino-acid composition of proteins, as well as changes in viral RNAs, binding sites, and splice sites. Information can also be encoded in the interaction between residues in a single protein as well as across proteins, leading to a change in the epistatic patterns that can affect how the virus can change in the future. Measuring epistasis usually requires fitness measurements that are difficult to obtain in high-throughput. Here we show that epistasis can be inferred from the pair-wise information between residues, and study how epistasis and information have changed over the long-term. Using HIV-1 protease sequence data from public databases covering the years 1998-2006 (from both treated and untreated subjects), we show that drug treatment has increased the protease’s per-site entropies on average. At the same time, the sum of mutual entropies across all pairs of residues within the protease shows a significant increase over the years, indicating an increase in epistasis in response to treatment, a trend not seen within sequences from untreated subjects. Our findings suggest that information theory can be an important tool to study long-term trends in the evolution of macromolecules.

V genes in primates from whole genome shotgun data

V genes in primates from whole genome shotgun data
David N Olivieri, Francisco Gambon-Deza

The adaptive immune system uses V genes for antigen recognition. The evolutionary diversification and selection processes within and across species and orders are poorly understood. Here, we studied the amino acid (AA) sequences obtained of translated in-frame V exons of immunoglobulins (IG) and T cell receptors (TR) from 16 primate species whose genomes have been sequenced. Multi-species comparative analysis supports the hypothesis that V genes in the IG loci undergo birth/death processes, thereby permitting rapid adaptability over evolutionary time. We also show that multiple cladistic groupings exist in the TRA (35 clades) and TRB (25 clades) V gene loci and that each primate species typically contributes at least one V gene to each of these clade. The results demonstrate that IG V genes and TR V genes have quite different evolutionary pathways; multiple duplications can explain the IG loci results, while co-evolutionary pressures can explain the phylogenetic results, as seen in genes of the TR loci. We describe how each of the 35 V genes clades of the TRA locus and 25 clades of the TRB locus must have specific and necessary roles for the viability of the species.

Author post: Predicting evolution from the shape of genealogical trees

This guest post by Richard Neher discusses his preprint Predicting evolution from the shape of genealogical trees. Richard A. Neher, Colin A. Russell, Boris I. Shraiman. arXived here. This is cross-posted from the Neher lab website.

In this preprint — a collaboration with Colin Russell and Boris Shraiman — we show that it is possible to predict which individual from a population is most closely related to future populations. To this end, we have developed a method that uses the branching pattern of genealogical trees to estimate which part of the tree contains the “fittest” sequences, where fit means rapidly multiplying. Those that multiply rapidly, are most likely to take over the population. We demonstrate the power of our method by predicting the evolution of seasonal influenza viruses.

How does it work?
Individuals adapt to a changing environment by accumulating beneficial mutations, while avoiding deleterious mutations. We model this process assuming that there are many such mutations which change fitness in small increments. Using this model, we calculate the probability that an individual that lived in the past at time t leaves n descendants in the present. This distributions depends critically on the fitness of the ancestral individual. We then extend this calculation to the probability of observing a certain branch in a genealogical tree reconstructed from a sample of sequences. A branch in a tree connects an individual A that lived at time tA and had fitness xA and with an individual B that lived at a later time tB with fitness xB as illustrated in the figure. B has descendants in the sample, otherwise the branch would not be part of the tree. Furthermore, all sampled descendants of A are also descendants of B, otherwise the connection between A and B would have branched between tA and tB. We call the mathematical object describing fitness evolution between A and B “branch propagator” and propagatordenote it by g(xB,tB|xA,tA). The joint probability distribution of fitness values of all nodes of the tree is given by a product of branch propagators. We then calculate the expected fitness of each node and use it to rank the sampled sequences. The top ranked sequence is our prediction for the sequence of the progenitor of the future population.

Why do we care?
flu_tree Being able to predict evolution could have immediate applications. The best example is the seasonal influenza vaccine, that needs to be updated frequently to keep up with the evolving virus. Vaccine strains are chosen among sampled virus strains, and the more closely this strain matches the future influenza virus population, the better the vaccine is going to be. Hence by predicting a likely progenitor of the future, our method could help to improve influenza vaccines. One of our predictions is shown in the figure, with the top ranked sequence marked by a black arrow. Influenza is not the only possible application. Since the algorithm only requires a reconstructed tree as input, it can be applied to other rapidly evolving pathogens or cancer cell populations. In addition, to being useful, the ability to predict also implies that the model captures an essential aspect of evolutionary dynamics: influenza evolution is to a substantial degree — enough to enable prediction — dependent on the accumulation of small effect mutations.

Comparison to other approaches
Given the importance of good influenza vaccines, there has been a number of previous efforts to anticipate influenza virus evolution, typically based on using patterns of molecular evolution from historical data. Along these lines, Luksza and Lässig have recently presented an explicit fitness model for influenza virus evolution that rewards mutations at positions known to convey antigenic novelty and penalizes likely deleterious mutations (+a few other things). By using molecular influenza specific signatures, this model is complementary to ours that uses only the tree reconstructed from nucleotide sequences. Interestingly, the two models do more or less equally well and combining different methods of prediction should result in more reliable results.

Sequence co-evolution gives 3D contacts and structures of protein complexes

Sequence co-evolution gives 3D contacts and structures of protein complexes
Thomas A. Hopf, Charlotta P.I. Schärfe, João P.G.L.M. Rodrigues, Anna G. Green, Chris Sander, Alexandre M.J.J. Bonvin, Debora S. Marks

High-throughput experiments in bacteria and eukaryotic cells have identified tens of thousands of interactions between proteins. This genome-wide view of the protein interaction universe is coarse-grained, whilst fine-grained detail of macro-molecular interactions critically depends on lower throughput, labor-intensive experiments. Computational approaches using measures of residue co-evolution across proteins show promise, but have been limited to specific interactions. Here we present a new generalized method showing that patterns of evolutionary sequence changes across proteins reflect residues that are close in space, with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We demonstrate that the inferred evolutionary coupling scores accurately predict inter-protein residue interactions and can distinguish between interacting and non-interacting proteins. To illustrate the utility of the method, we predict co-evolved contacts between 50 E. coli complexes (of unknown structure), including the unknown 3D interactions between subunits of ATP synthase and find results consistent with detailed experimental data. We expect that the method can be generalized to genome-wide interaction predictions at residue resolution.

Author post: Tandem duplications and the limits of natural selection in Drosophila yakuba and Drosophila simulans

This guest post is by Rebekah Rogers (@evolscientist) on her paper with coauthors “Tandem duplications and the limits of natural selection in Drosophila yakuba and Drosophila simulans” arXived here.

Tandem duplications are widely recognized as a source of genetic novelty. Duplication of gene sequences can result in adaptive evolution through the development of novel functions or specialization in subsets of ancestral functions when ‘spare parts’ are relieved of evolutionary constraints. Additionally, tandem duplications have the potential to create entirely novel gene structures through chimeric gene formation and recruitment of formerly non-coding sequence. Here, we survey the limits of standing variation for tandem duplications in natural populations of D. yakuba and D. simulans, estimate the upper bound of mutation rates, and explore their role in rapid evolution.

Tandem duplicates on the X chromosome in D. simulans show an excess of high frequency variants consistent with adaptive evolution through tandem duplication. Furthermore, we identify an overrepresentation of genes involved in rapidly evolving phenotypes such as chorion development and oogenesis, drug and toxin metabolism, chitin cuticle formation, chemosensory processes, lipases and endopeptidases expressed in male reproduction, as well as immune response to pathogens in both D. yakuba and D. simulans. The enrichment of such rapidly evolving functional classes points to a role for tandem duplicates in Red Queen dynamics and responses to strong selective pressures.
In spite of the observed concordance across functional classes we observe few duplicated genes that are shared across species indicating that parallel recruitment of tandem duplications is rare. The span of duplicates in the population is quite limited, and we estimate that less than 15% of the genome is represented among the tandem duplications segregating in the entire population for the species. Moreover, many duplicates are present at low frequency and will have difficulty escaping the forces of drift during selective sweeps. This very limited standing variation combined with low mutation rates for tandem duplications results in severe limitations in the substrate of genetic novelty that is available for adaptation.

Thus, the limits of standing variation and the rate of new mutations are expected to play a vital role in defining evolutionary trajectories and the ability of organisms to adapt in the event of gross environmental change. Given the limited substrate of genetic novelty, we expect that if adaptation is dependent upon gene duplications, suboptimal outcomes in adaptive walks will be common, long wait times will occur for new phenotypic changes, and many multicellular eukaryotes will display limited ability to adapt to rapidly changing environments.

An experimentally determined evolutionary model dramatically improves phylogenetic fit

An experimentally determined evolutionary model dramatically improves phylogenetic fit
Jesse D Bloom

All modern approaches to molecular phylogenetics require a quantitative model for how genes evolve. Unfortunately, existing evolutionary models do not realistically represent the site-heterogeneous selection that governs actual sequence change. Attempts to remedy this problem have involved augmenting these models with a burgeoning number of free parameters. Here I demonstrate an alternative: experimental determination of a parameter-free evolutionary model via mutagenesis, functional selection, and deep sequencing. Using this strategy, I create an evolutionary model for influenza nucleoprotein that describes the gene phylogeny far better than existing models with dozens or even hundreds of free parameters. High-throughput experimental strategies such as the one employed here provide fundamentally new information that has the potential to transform the sensitivity of phylogenetic analyses.

motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences

motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences
Dennis Kostka, Tara Friedrich, Alisha K. Holloway, Katherine S. Pollard
(Submitted on 1 Feb 2014)

Next-generation sequencing technology enables the identification of thousands of gene regulatory sequences in many cell types and organisms. We consider the problem of testing if two such sequences differ in their number of binding site motifs for a given transcription factor (TF) protein. Binding site motifs impart regulatory function by providing TFs the opportunity to bind to genomic elements and thereby affect the expression of nearby genes. Evolutionary changes to such functional DNA are hypothesized to be major contributors to phenotypic diversity within and between species; but despite the importance of TF motifs for gene expression, no method exists to test for motif loss or gain. Assuming that motif counts are Binomially distributed, and allowing for dependencies between motif instances in evolutionarily related sequences, we derive the probability mass function of the difference in motif counts between two nucleotide sequences. We provide a method to numerically estimate this distribution from genomic data and show through simulations that our estimator is accurate. Finally, we introduce the R package {\tt motifDiverge} that implements our methodology and illustrate its application to gene regulatory enhancers identified by a mouse developmental time course experiment. While this study was motivated by analysis of regulatory motifs, our results can be applied to any problem involving two correlated Bernoulli trials.

Evolution at two levels of gene expression in yeast

Evolution at two levels of gene expression in yeast
Carlo G. Artieri, Hunter B. Fraser
(Submitted on 27 Nov 2013)

Despite the greater functional importance of protein levels, our knowledge of gene expression evolution is based almost entirely on studies of mRNA levels. In contrast, our understanding of how translational regulation evolves has lagged far behind. Here we have applied ribosome profiling – which measures both global mRNA levels and their translation rates – to two species of Saccharomyces yeast and their interspecific hybrid in order to assess the relative contributions of changes in mRNA abundance and translation to regulatory evolution. We report that both cis and trans-acting regulatory divergence in translation are abundant, affecting at least 35% of genes. The majority of translational divergence acts to buffer changes in mRNA abundance, suggesting a widespread role for stabilizing selection acting across regulatory levels. Nevertheless, we observe evidence of lineage-specific selection acting on a number of yeast functional modules, including instances of reinforcing selection acting at both levels of regulation. Finally, we also uncover multiple instances of stop-codon readthrough that are conserved between species. Our analysis reveals the under-appreciated complexity of post-transcriptional regulatory divergence and indicates that partitioning the search for the locus of selection into the binary categories of ‘coding’ vs. ‘regulatory’ may overlook a significant source of selection, acting at multiple regulatory levels along the path from genotype to phenotype.

Patterns of positive selection in seven ant genomes

Patterns of positive selection in seven ant genomes

Julien Roux, Eyal Privman, Sebastien Moretti, Josephine T. Daub, Marc Robinson-Rechavi, Laurent Keller
(Submitted on 19 Nov 2013)

The evolution of ant species is marked by remarkable adaptations that allowed the development of very complex social systems. To identify how ant-specific adaptations are associated with specific patterns of molecular evolution we searched for signs of positive selection on amino-acid changes in proteins during the evolution of the ant lineage. We identified 24 functional categories of genes which were enriched for positively selected genes in the ant lineage. We also reanalyzed genome-wide dataset in bees and flies with the same methodology to check if genes under positive selection in ants were also under positive selection in the other analyzed lineages. Notably, genes implicated in immunity were enriched for positively selected genes in the three lineages, ruling out the hypothesis that the evolution of hygienic behaviors in social insects caused a major relaxation of selective pressure on this set of genes. Our scan also indicated that genes implicated in neurogenesis and olfaction started to undergo increased positive selection before the evolution of sociality in Hymenoptera, although it is assumed that the main challenges of the olfactory and neural systems in this lineage occurred with the evolution of social living. Finally, the comparison between these three lineages allowed us to pinpoint molecular evolution patterns that were specific to the ant lineage. In particular, there was relaxed selective pressure for genes related to metabolism in ants but not in bees and flies, possibly reflecting the loss of flight in ant workers. By contrast, there was recurrent positive selection on genes with mitochondrial functions specifically in ants, suggesting that the activity of mitochondria was improved during ant evolution. This might have been an important step toward the evolution of extreme lifespan that is a hallmark of this lineage.

A stochastic microscopic model for the dynamics of antigenic variation

A stochastic microscopic model for the dynamics of antigenic variation

Gustavo Guerberoff, Fernando Alvarez-Valin
(Submitted on 8 Nov 2013)

We present a novel model that describes the within-host evolutionary dynamics of parasites undergoing antigenic variation. The approach uses a multi-type branching process with two types of entities defined according to their relationship with the immune system: clans of resistant parasitic cells (i.e. groups of cells sharing the same antigen not yet recognized by the immune system) that may become sensitive, and individual sensitive cells that can acquire a new resistance thus giving rise to the emergence of a new clan. The simplicity of the model allows analytical treatment to determine the subcritical and supercritical regimes in the space of parameters. By incorporating a density-dependent mechanism the model is able to capture additional relevant features observed in experimental data, such as the characteristic parasitemia waves. In summary our approach provides a new general framework to address the dynamics of antigenic variation which can be easily adapted to cope with broader and more complex situations.