Too packed to change: site-specific substitution rates and side-chain packing in protein evolution

Too packed to change: site-specific substitution rates and side-chain packing in protein evolution
María Laura Marcos, Julian Echave
doi: http://dx.doi.org/10.1101/013359

In protein evolution, due to functional and biophysical constraints, the rates of amino acid substitution differ from site to site. Among the best predictors of site-specific rates is packing density. The packing density measure that best correlates with rates is the weighted contact number (WCN), the sum of inverse square distances between the site’s Cα and the other Cαs . According to a mechanistic stress model proposed recently, rates are determined by packing because mutating packed sites stresses and destabilizes the protein’s active conformation. While WCN is a measure of Cα packing, mutations replace side chains, which prompted us to consider whether a site’s evolutionary divergence is constrained by main-chain packing or side-chain packing. To address this issue, we extended the stress theory to model side chains explicitly. The theory predicts that rates should depend solely on side-chain packing. We tested these predictions on a data set of structurally and functionally diverse monomeric enzymes. We found that, on average, side-chain contact density (WCNρ ) explains 39.1% of among-sites rate variation, larger than main-chain contact density (WCNα ) which explains 32.1%. More importantly, the independent contribution of WCNα is only 0.7%. Thus, as predicted by the stress theory, site-specific evolutionary rates are determined by side-chain packing.

High-resolution genomic surveillance of 2014 ebolavirus using shared subclonal variants

High-resolution genomic surveillance of 2014 ebolavirus using shared subclonal variants

Kevin J Emmett, Albert K Lee, Hossein Khiabanian, Raul Rabadan
doi: http://dx.doi.org/10.1101/013318

Viral outbreaks, such as the 2014 ebolavirus, can spread rapidly and have complex evolutionary dynamics, including coinfection and bulk transmission of multiple viral populations. Genomic surveillance can be hindered when the spread of the outbreak exceeds the evolutionary rate, in which case consensus approaches will have limited resolution. Deep sequencing of infected patients can identify genomic variants present in intrahost populations at subclonal frequencies (i.e. <50%). Shared subclonal variants (SSVs) can provide additional phylogenetic resolution and inform about disease transmission patterns. Here, we use metrics from population genetics to analyze data from the 2014 ebolavirus outbreak in Sierra Leone and identify phylogenetic signal arising from SSVs. We use methods derived from information theory to measure a lower bound on transmission bottleneck size that is larger than one founder population, yet significantly smaller than the intrahost effective population. Our results demonstrate the important role of shared subclonal variants in genomic surveillance.

Reconstructing gene content in the last common ancestor of cellular life: is it possible, should it be done, and are we making any progress?

Reconstructing gene content in the last common ancestor of cellular life: is it possible, should it be done, and are we making any progress?

Arcady Mushegian
doi: http://dx.doi.org/10.1101/013326

I review recent literature on the reconstruction of gene repertoire of the Last Universal Common Ancestor of cellular life (LUCA). The form of the phylogenetic record of cellular life on Earth is important to know in order to reconstruct any ancestral state; therefore I also discuss the emerging understanding that this record does not take the form of a tree. I argue that despite this, “tree-thinking” remains an essential component in evolutionary thinking and that “pattern pluralism” in evolutionary biology can be only epistemological, but not ontological.

The evolutionarily stable distribution of fitness effects

The evolutionarily stable distribution of fitness effects

Daniel P. Rice, Benjamin H. Good, Michael M. Desai
doi: http://dx.doi.org/10.1101/013052

The distribution of fitness effects of new mutations (the DFE) is a key parameter in determining the course of evolution. This fact has motivated extensive efforts to measure the DFE or to predict it from first principles. However, just as the DFE determines the course of evolution, the evolutionary process itself constrains the DFE. Here, we analyze a simple model of genome evolution in a constant environment in which natural selection drives the population toward a dynamic steady state where beneficial and deleterious substitutions balance. The distribution of fitness effects at this steady state is stable under further evolution, and provides a natural null expectation for the DFE in a population that has evolved in a constant environment for a long time. We calculate how the shape of the evolutionarily stable DFE depends on the underlying population genetic parameters. We show that, in the absence of epistasis, the ratio of beneficial to deleterious mutations of a given fitness effect obeys a simple relationship independent of population genetic details. Finally, we analyze how the stable DFE changes in the presence of a simple form of diminishing returns epistasis.

DNA-guided establishment of canonical nucleosome patterns in a eukaryotic genome

DNA-guided establishment of canonical nucleosome patterns in a eukaryotic genome

Leslie Y Beh, Noam Kaplan, Manuel M Muller, Tom W Muir, Laura F Landweber
doi: http://dx.doi.org/10.1101/013250

A conserved hallmark of eukaryotic chromatin architecture is the distinctive array of well-positioned nucleosomes downstream of transcription start sites (TSS). Recent studies indicate that trans-acting factors establish this stereotypical array. Here, we present the first genome-wide in vitro and in vivo nucleosome maps for the ciliate Tetrahymena thermophila. In contrast with previous studies in yeast, we find that the stereotypical nucleosome array is preserved in the in vitro reconstituted map, which is governed only by the DNA sequence preferences of nucleosomes. Remarkably, this average in vitro pattern arises from the presence of subsets of nucleosomes, rather than the whole array, in individual Tetrahymena genes. Variation in GC content contributes to the positioning of these sequence-directed nucleosomes, and affects codon usage and amino acid composition in genes. We propose that these ‘seed’ nucleosomes may aid the AT-rich Tetrahymena genome – which is intrinsically unfavorable for nucleosome formation – in establishing nucleosome arrays in vivo in concert with trans-acting factors, while minimizing changes to the coding sequences they are embedded within.

Model Inadequacy and Mistaken Inferences of Trait-Dependent Speciation

Model Inadequacy and Mistaken Inferences of Trait-Dependent Speciation

Daniel L. Rabosky, Emma E. Goldberg
(Submitted on 22 Dec 2014)

Species richness varies widely across the tree of life, and there is great interest in identifying ecological, geographic, and other factors that affect rates of species proliferation. Recent methods for explicitly modeling the relationships among character states, speciation rates, and extinction rates on phylogenetic trees- BiSSE, QuaSSE, GeoSSE, and related models – have been widely used to test hypotheses about character state-dependent diversification rates. Here, we document the disconcerting ease with which neutral traits are inferred to have statistically significant associations with speciation rate. We first demonstrate this unfortunate effect for a known model assumption violation: shifts in speciation rate associated with a character not included in the model. We further show that for many empirical phylogenies, characters simulated in the absence of state-dependent diversification exhibit an even higher Type I error rate, indicating that the method is susceptible to additional, unknown model inadequacies. For traits that evolve slowly, the root cause appears to be a statistical framework that does not require replicated shifts in character state and diversification. However, spurious associations between character state and speciation rate arise even for traits that lack phylogenetic signal, suggesting that phylogenetic pseudoreplication alone cannot fully explain the problem. The surprising severity of this phenomenon suggests that many trait-diversification relationships reported in the literature may not be real. More generally, we highlight the need for diagnosing and understanding the consequences of model inadequacy in phylogenetic comparative methods.

Marker-based estimation of heritability in immortal populations

Marker-based estimation of heritability in immortal populations

Willem Kruijer, Martin Boer, Marcos Malosetti, Padraic J. Flood, Bas Engel, Rik Kooke, Joost Keurentjes, Fred van Eeuwijk
(Submitted on 21 Dec 2014)

Heritability is a central parameter in quantitative genetics, both from an evolutionary and a breeding perspective. For plant traits heritability is traditionally estimated by comparing within and between genotype variability. This approach estimates broad-sense heritability, and does not account for different genetic relatedness. With the availability of high-density markers there is growing interest in marker based estimates of narrow-sense heritability, using mixed models in which genetic relatedness is estimated from genetic markers. Such estimates have received much attention in human genetics but are rarely reported for plant traits. A major obstacle is that current methodology and software assume a single phenotypic value per genotype, hence requiring genotypic means. An alternative that we propose here, is to use mixed models at individual plant or plot level. Using statistical arguments, simulations and real data we investigate the feasibility of both approaches, and how these affect genomic prediction with G-BLUP and genome-wide association studies. Heritability estimates obtained from genotypic means had very large standard errors and were sometimes biologically unrealistic. Mixed models at individual plant or plot level produced more realistic estimates, and for simulated traits standard errors were up to 13 times smaller. Genomic prediction was also improved by using these mixed models, with up to a 49% increase in accuracy. For GWAS on simulated traits, the use of individual plant data gave almost no increase in power. The new methodology is applicable to any complex trait where multiple replicates of individual genotypes can be scored. This includes important agronomic crops, as well as bacteria and fungi.