A novel test for selection on cis-regulatory elements reveals positive and negative selection acting on mammalian transcriptional enhancers

A novel test for selection on cis-regulatory elements reveals positive and negative selection acting on mammalian transcriptional enhancers
Justin D. Smith, Kimberly F. McManus, Hunter B. Fraser
(Submitted on 7 Aug 2013)

Measuring natural selection on genomic elements involved in the cis-regulation of gene expression — such as transcriptional enhancers and promoters — is critical for understanding the evolution of genomes, yet it remains a major challenge. Many studies have attempted to detect positive or negative selection in these noncoding elements by searching for those with the fastest or slowest rates of evolution, but this can be problematic. Here we introduce a new approach to this issue, and demonstrate its utility on three mammalian transcriptional enhancers. Using results from saturation mutagenesis studies of these enhancers, we classified all possible point mutations as up-regulating, down-regulating, or silent, and determined which of these mutations have occurred on each branch of a phylogeny. Applying a framework analogous to Ka/Ks in protein-coding genes, we measured the strength of selection on up-regulating and down-regulating mutations, in specific branches as well as entire phylogenies. We discovered distinct modes of selection acting on different enhancers: while all three have experienced negative selection against down-regulating mutations, the selection pressures on up-regulating mutations vary. In one case we detected positive selection for up-regulation, while the other two had no detectable selection on up-regulating mutations. Our methodology is applicable to the growing number of saturation mutagenesis data sets, and provides a detailed picture of the mode and strength of natural selection acting on cis-regulatory elements.

The molecular mechanism of a cis-regulatory adaptation in yeast

The molecular mechanism of a cis-regulatory adaptation in yeast
Jessica Chang, Yiqi Zhou, Xiaoli Hu, Lucia Lam, Cameron Henry, Erin M. Green, Ryosuke Kita, Michael S. Kobor, Hunter B. Fraser
(Submitted on 7 Aug 2013)

Despite recent advances in our ability to detect adaptive evolution involving the cis-regulation of gene expression, our knowledge of the molecular mechanisms underlying these adaptations has lagged far behind. Across all model organisms the causal mutations have been discovered for only a handful of gene expression adaptations, and even for these, mechanistic details (e.g. the trans-regulatory factors involved) have not been determined. We previously reported a polygenic gene expression adaptation involving down-regulation of the ergosterol biosynthesis pathway in the budding yeast Saccharomyces cerevisiae. Here we investigate the molecular mechanism of a cis-acting mutation affecting a member of this pathway, ERG28. We show that the causal mutation is a two-base deletion in the promoter of ERG28 that strongly reduces the binding of two transcription factors, Sok2 and Mot3, thus abolishing their regulation of ERG28. This down-regulation increases resistance to a widely used antifungal drug targeting ergosterol, similar to mutations disrupting this pathway in clinical yeast isolates. The identification of the causal genetic variant revealed that the selection likely occurred after the deletion was already present at high frequency in the population, rather than when it was a new mutation. These results provide a detailed view of the molecular mechanism of a cis-regulatory adaptation, and underscore the importance of this view to our understanding of evolution at the molecular level.

Our paper: Inferring HIV escape rates from multi-locus genotype data

This guest post is by Richard Neher on his paper with Taylor Kessinger and Alan Perelson: Kessinger et al. Inferring HIV escape rates from multi-locus genotype data. arXived here.
This is cross posted from the Neher lab website.

We have a new preprint on the arXiv (here on Haldane’s sieve). This work is the result of a collaboration between us and Alan Perelson, LANL, and explores methods to estimate parameters of the HIV-immune system interaction from time resolved sequence data. The focus of this paper is on early infeImagection dominated by a few rapid substitutions that fix because they prevent or reduce recognition of infected cells by the immune system via cytotoxic T-lymphocytes (CTL). CTL escape is one of the fastest instances of evolution I have come across. 4-6 mutations spread within a few weeks. It happens in most HIV infections and is partly predictable based on the HLA genotype of the infected person. These substitutions are so rapid that clonal interference has to be modeled. Our method fits a reduced model of clonal interference to the typically very sparse data and thereby estimates the selection coefficients, aka escape rates.

Why do we want to know these numbers?
The number of viruses in the blood of an infected person peaks 2-3 weeks after infection and thereafter drops by 2-3 order of magnitude. This drop is partly due to a response by the adaptive immune system. However, it has proved difficult to attribute this drop to specific parts of the immune response. The rates at which different mutations sweep through the population gives us information about the pressure exerted by the T-cell clones that target the epitope containing this mutation.

How do we do it?
Early in infection, the viral population is large and selection is strong. In these conditions, recombination is of minor importance since most double/triple… mutants are more efficiently produced by recurrent mutation than recombination. This implies that mutations accumulate sequentially always on a background one which already all previous mutations are present. The time at which a novel mutation happens in tightly constrained by the trajectory of preceding genotype. These constraints regularize the fitting problem to some degree and the multi-locus fitting is more robust than single locus fitting.

What do we learn about evolution in general?
In addition to the intrinsic interest in the HIV/CTL interaction, CTL escape is an ideal setting to study rapidly evolving populations. This evolution happens in its “natural” habitat and the selective pressure as well as the functional consequences of the observed molecular changes can be quantified via immunological data, protein structure, and replication assays. In addition, we have ample cross-sectional data (HIV sequences from many different patients) that allows us to look at prevalence of the escape mutations and potential compensatory mutations. None of this is done in this paper, but studying HIV/immune-system coevolution is a fascinating show case of rapid evolution.

Inferring HIV escape rates from multi-locus genotype data

Inferring HIV escape rates from multi-locus genotype data
Taylor A. Kessinger, Alan S. Perelson, Richard A. Neher
(Submitted on 6 Aug 2013)

Cytotoxic T-lymphocytes (CTLs) recognize viral protein fragments displayed by major histocompatibility complex (MHC) molecules on the surface of virally infected cells and generate an anti-viral response that can kill the infected cells. Virus variants whose protein fragments are not efficiently presented on infected cells or whose fragments are presented but not recognized by CTLs therefore have a competitive advantage and spread rapidly through the population. We present a method that allows a more robust estimation of these escape rates from serially sampled sequence data. The proposed method accounts for competition between multiple escapes by explicitly modeling the accumulation of escape mutations and the stochastic effects of rare multiple mutants. Applying our method to serially sampled HIV sequence data, we estimate rates of HIV escape that are substantially larger than those previously reported. The method can be extended to complex escapes that require compensatory mutations. We expect our method to be applicable in other contexts such as cancer evolution where time series data is also available.

Macro-evolutionary models and coalescent point processes: The shape and probability of reconstructed phylogenies

Macro-evolutionary models and coalescent point processes: The shape and probability of reconstructed phylogenies
Amaury Lambert, Tanja Stadler
(Submitted on 6 Aug 2013)

Forward-time models of diversification (i.e., speciation and extinction) produce phylogenetic trees that grow “vertically” as time goes by. Pruning the extinct lineages out of such trees leads to natural models for reconstructed trees (i.e., phylogenies of extant species). Alternatively, reconstructed trees can be modelled by coalescent point processes (CPP), where trees grow “horizontally” by the sequential addition of vertical edges. Each new edge starts at some random speciation time and ends at the present time; speciation times are drawn from the same distribution independently. CPP lead to extremely fast computation of tree likelihoods and simulation of reconstructed trees. Their topology always follows the uniform distribution on ranked tree shapes (URT). We characterize which forward-time models lead to URT reconstructed trees and among these, which lead to CPP reconstructed trees. We show that for any “asymmetric” diversification model in which speciation rates only depend on time and extinction rates only depend on time and on a non-heritable trait (e.g., age), the reconstructed tree is CPP, even if extant species are incompletely sampled. If rates additionally depend on the number of species, the reconstructed tree is (only) URT (but not CPP). We characterize the common distribution of speciation times in the CPP description, and discuss incomplete species sampling as well as three special model cases in detail: 1) extinction rate does not depend on a trait; 2) rates do not depend on time; 3) mass extinctions may happen additionally at certain points in the past.

Bayesian genome assembly and assessment by Markov Chain Monte Carlo sampling

Bayesian genome assembly and assessment by Markov Chain Monte Carlo sampling
Mark Howison, Felipe Zapata, Erika J. Edwards, Casey W. Dunn
(Submitted on 6 Aug 2013)

Most genome assemblers provide a point estimates of the true genome sequences, chosen from among many alternative hypotheses that are supported by the data. We present a Markov Chain Monte Carlo approach to sequence assembly that instead generates a distribution of assembly hypotheses with quantified probabilities. This statistically explicit Bayesian approach to assembly allows the investigator to evaluate alternative assembly hypotheses in a unified framework and propagate uncertainty about genomes assembly to downstream analyses. We implement this approach in a prototype assembler and illustrate its application to the genome of the bacteriophage $\Phi$X174.

Proceedings of the 13th Workshop on Algorithms in Bioinformatics (WABI2013)

Proceedings of the 13th Workshop on Algorithms in Bioinformatics (WABI2013)
Aaron Darling, Jens Stoye
(Submitted on 6 Aug 2013)

These are the proceedings of the 13th Workshop on Algorithms in Bioinformatics, WABI2013, which was held September 2-4 2013 in Sophia Antipolis, France. All manuscripts were peer reviewed by the WABI2013 program committee and external reviewers.

Lineage specific reductions in genome size in salamanders are associated with increased rates of mutation

Lineage specific reductions in genome size in salamanders are associated with increased rates of mutation
John Herrick, Bianca Sclavi
(Submitted on 4 Aug 2013)

Very low levels of genetic diversity have been reported in vertebrates with large genomes, notably salamanders and lungfish [1-3]. Interpreting differences in heterozygosity, which reflects genetic diversity in a population, is complicated because levels of heterozygosity vary widely between conspecific populations, and correlate with many different physiological and demographic variables such as body size and effective population size. Here we return to the question of genetic variability in salamanders, and report on the relationship between evolutionary rates and genome sizes in five different salamander families. We found that rates of evolution are exceptionally low in salamanders as a group. Evolutionary rates are as low as those reported for cartilaginous fish, which have the slowest rates recorded so far in vertebrates [4]. We also found that, independent of life history, salamanders with the smallest genomes (14 pg) are evolving at rates two to three times faster than salamanders with the largest genomes (>50 pg). After accounting for evolutionary duration, we conclude that speciation events in salamanders are associated with contractions in genome size and concomitant increases in mutation and diversification rates.

Effect of linkage on the equilibrium frequency of deleterious mutations

Effect of linkage on the equilibrium frequency of deleterious mutations
Sona John, Kavita Jain
(Submitted on 5 Aug 2013)

We study the evolution of an asexual population of binary sequences of finite length in which both deleterious and reverse mutations can occur. Such a model has been used to understand the prevalence of preferred codons due to selection, mutation and drift, and proposed as a possible mechanism for halting the irreversible degeneration of asexual population due to Muller’s ratchet. Using an analytical argument and numerical simulations, we study the dependence of the equilibrium fraction of deleterious mutations on various population genetic parameters. In contrast to the one-locus theory, where the fraction of disadvantageous mutations decreases exponentially fast with increasing population size, we find that in the multilocus model, it decreases to zero exponentially for very large populations but approaches a constant for smaller populations logarithmically. The weak dependence on the population size may explain the similar levels of codon bias seen in populations of different sizes.

The pattern and distribution of deleterious mutations in maize

The pattern and distribution of deleterious mutations in maize
Sofiane Mezmouk, Jeffrey Ross-Ibarra
(Submitted on 2 Aug 2013)

Most non-synonymous mutations are thought to be deleterious because of their effect on protein sequence. These polymorphisms are expected to be removed or kept at low frequency by the action of natural selection, and rare deleterious variants have been implicated as a possible explanation for the “missing heritability” seen in many studies of complex traits. Nonetheless, the effect of positive selection on linked sites or drift in small or inbred populations may also impact the evolution of deleterious alleles. Here, we made use of genome-wide genotyping data to characterize deleterious variants in a large panel of maize inbred lines. We show that, in spite of small effective population sizes and inbreeding, most putatively deleterious SNPs are indeed at low frequencies within individual genetic groups. We find that genes showing associations with a number of complex traits are enriched for deleterious variants. Together these data are consistent with the dominance model of heterosis, in which complementation of numerous low frequency, weak deleterious variants contribute to hybrid vigor.