Genome of octoploid plant maca (Lepidium meyenii) illuminates genomic basis for high altitude adaptation in the central Andes
Jun Sheng , Wei Chen , Yang Dong , Liangsheng Zhang , Jing Zhang , Yang Tian , Liang Yan , Guanghui Zhang , Xiao Wang , Yan Zeng , Jiajin Zhang , Xiao Ma , Yuntao Tan , Ni Long , Yangzi Wang , Yujin Ma , Yu Xue , Shumei Hao , Shengchao Yang , Wen Wang
Maca (Lepidium meyenii Walp, 2n = 8x = 64) of Brassicaceae family is an Andean economic plant cultivated on the 4000-4500 meters central sierra in Peru. Considering the rapid uplift of central Andes occurred 5 to 10 million years ago (Mya), an evolutionary question arises on how plants like maca acquire high altitude adaptation within short geological period. Here, we report the high-quality genome assembly of maca, in which two close-spaced maca-specific whole genome duplications (WGDs, ~ 6.7 Mya) were identified. Comparative genomics between maca and close-related Brassicaceae species revealed expansions of maca genes and gene families involved in abiotic stress response, hormone signaling pathway and secondary metabolite biosynthesis via WGDs. Retention and subsequent evolution of many duplicated genes may account for the morphological and physiological changes (i.e. small leaf shape and loss of vernalization) in maca for high altitude environment. Additionally, some duplicated maca genes under positive selection were identified with functions in morphological adaptation (i.e. MYB59) and development (i.e. GDPD5 and HDA9). Collectively, the octoploid maca genome sheds light on the important roles of WGDs in plant high altitude adaptation in the Andes.
Too packed to change: site-specific substitution rates and side-chain packing in protein evolution
María Laura Marcos, Julian Echave
In protein evolution, due to functional and biophysical constraints, the rates of amino acid substitution differ from site to site. Among the best predictors of site-specific rates is packing density. The packing density measure that best correlates with rates is the weighted contact number (WCN), the sum of inverse square distances between the site’s Cα and the other Cαs . According to a mechanistic stress model proposed recently, rates are determined by packing because mutating packed sites stresses and destabilizes the protein’s active conformation. While WCN is a measure of Cα packing, mutations replace side chains, which prompted us to consider whether a site’s evolutionary divergence is constrained by main-chain packing or side-chain packing. To address this issue, we extended the stress theory to model side chains explicitly. The theory predicts that rates should depend solely on side-chain packing. We tested these predictions on a data set of structurally and functionally diverse monomeric enzymes. We found that, on average, side-chain contact density (WCNρ ) explains 39.1% of among-sites rate variation, larger than main-chain contact density (WCNα ) which explains 32.1%. More importantly, the independent contribution of WCNα is only 0.7%. Thus, as predicted by the stress theory, site-specific evolutionary rates are determined by side-chain packing.
The rate and molecular spectrum of spontaneous mutations in the GC-rich multi-chromosome genome of Burkholderia cenocepacia
Marcus M Dillon, Way Sung, Michael Lynch, Vaughn S Cooper
Spontaneous mutations are ultimately essential for evolutionary change and are also the root cause of nearly all disease. However, until recently, both biological and technical barriers have prevented detailed analyses of mutation profiles, constraining our understanding of the mutation process to a few model organisms and leaving major gaps in our understanding of the role of genome content and structure on mutation. Here, we present a genome-wide view of the molecular mutation spectrum in Burkholderia cenocepacia, a clinically relevant pathogen with high %GC content and multiple chromosomes. We find that B. cenocepacia has low genome-wide mutation rates with insertion-deletion mutations biased towards deletions, consistent with the idea that deletion pressure reduces prokaryotic genome sizes. Unlike previously assayed organisms, B. cenocepacia exhibits a GC-mutation bias, which suggests that at least some genomes with high GC content may be driven to this point by unusual base-substitution mutation pressure. Notably, we also observed variation in both the rates and spectra of mutations among chromosomes, and a significant elevation of G:C>T:A transversions in late-replicating regions. Thus, although some patterns of mutation appear to be highly conserved across cellular life, others vary between species and even between chromosomes of the same species, potentially influencing the evolution of nucleotide composition and genome architecture.
Tissue-specific evolution of protein coding genes in human and mouse
Nadezda Kryuchkova, Marc Robinson-Rechavi
Protein-coding genes evolve at different rates, and the influence of different parameters, from gene size to expression level, has been extensively studied. While in yeast gene expression level is the major causal factor of gene evolutionary rate, the situation is more complex in animals. Here we investigate these relations further, especially taking in account gene expression in different organs as well as indirect correlations between parameters. We used RNA-seq data from two large datasets, covering 22 mouse tissues and 27 human tissues. Over all tissues, evolutionary rate only correlates weakly with levels and breadth of expression. The strongest explanatory factors of strong purifying selection are GC content, expression in many developmental stages, and expression in brain tissues. While the main component of evolutionary rate is purifying selection, we also find tissue-specific patterns for sites under neutral evolution and for positive selection. We observe fast evolution of genes expressed in testis, but also in other tissues, notably liver, which are explained by weak purifying selection rather than by positive selection.
GC-content evolution in bacterial genomes: the biased gene conversion hypothesis expands.
Florent Lassalle, Séverine Périan, Thomas Bataillon, Xavier Nesme, Laurent Duret, Vincent Daubin
The characterization of functional elements in genomes relies on the identification of the footprints of natural selection. In this quest, taking into account neutral evolutionary processes such as mutation and genetic drift is crucial because these forces can generate patterns that may obscure or mimic signatures of selection. In mammals, and probably in many eukaryotes, another such confounding factor called GC-Biased Gene Conversion (gBGC) has been documented. This mechanism generates patterns identical to what is expected under selection for higher GC-content, specifically in highly recombining genomic regions. Recent results have suggested that a mysterious selective force favouring higher GC-content exists in Bacteria but the possibility that it could be gBGC has been excluded. Here, we show that gBGC is probably at work in most if not all bacterial species. First we find a consistent positive relationship between the GC-content of a gene and evidence of intra-genic recombination throughout a broad spectrum of bacterial clades. Second, we show that the evolutionary force responsible for this pattern is acting independently from selection on codon usage, and could potentially interfere with selection in favor of optimal AU-ending codons. A comparison with data from human populations shows that the intensity of gBGC in Bacteria is comparable to what has been reported in mammals. We propose that gBGC is not restricted to sexual Eukaryotes but also widespread among Bacteria and could therefore be an ancestral feature of cellular organisms. We argue that if gBGC occurs in bacteria, it can account for previously unexplained observations, such as the apparent non-equilibrium of base substitution patterns and the heterogeneity of gene composition within bacterial genomes. Because gBGC produces patterns similar to positive selection, it is essential to take this process into account when studying the evolutionary forces at work in bacterial genomes.
Tackling drug resistant infection outbreaks of global pandemic Escherichia coli ST131 using evolutionary and epidemiological genomics
(Submitted on 4 Nov 2014)
High-throughput molecular approaches are required to investigate the origin and diffusion of antimicrobial resistance in rapidly radiating pathogen outbreaks. The most frequent cause of human infection is Escherichia coli, which is dominated by ST131, a single pandemic clone. This epidemic subtype possesses an extensive array of virulence elements and tolerates many drugs. Frequent global sweeps of new dominant ST131 varieties necessitate deep genomic scrutiny of their spread, evolution and lateral transfer of drug resistance genes. Phylogenetic methods that decipher past events can predict future patterns of virulence and transmission based on genetic signatures of adaptation and recombination. Antibiotic tolerance is controlled by natural variation in gene expression levels, which can initiate delayed cell growth. This dormancy allows survival despite drug exposure, and yet may only be present in part of the infecting cell population. Consequently, genomic epidemiology needs to explore the scale of phenotypic regulatory control acting on RNA. A multi-faceted approach can comprehensively assess antimicrobial resistance in E. coli ST131 in terms of within-host genetic heterogeneity, regulation of gene expression, and transmission dynamics between hosts to achieve a goal of pre-empting resistance before it emerges by optimising drug treatment protocols.
Introns structure patterns of variation in nucleotide composition in Arabidopsis thaliana and rice protein-coding genes
Adrienne Ressayre, Sylvain Glemin, Pierre Montalent, Laurana Serres-Giardi, Christine Dillmann, Johann Joets
Plant genomes are large, intron-rich and present a wide range of variation in coding region G+C content. Concerning coding regions, a sort of syndrome can be described in plants: the increase in G+C content is associated with both the increase in heterogeneity among genes within a genome and the increase in variation across genes. Taking advantage of the large number of genes composing plant genomes and the wide range of variation in gene intron number, we performed a comprehensive survey of the patterns of variation in G+C content at different scales from the nucleotide level to the genome scale in two species Arabidopsis thaliana and Oryza sativa, comparing the patterns in genes with different intron numbers. In both species, we observed a pervasive effect of gene intron number and location along genes on G+C content, codon and amino acid frequencies suggesting that in both species, introns have a barrier effect structuring G+C content along genes. In external gene regions (located upstream first or downstream last intron), species-specific factors are shaping G+C content while in internal gene regions (surrounded by introns), G+C content is constrained to remain within a range common to both species. In rice, introns appear as a major determinant of gene G+C content while in A. thaliana introns have a weaker but significant effect. The structuring effect of introns in both species is susceptible to explain the G+C content syndrome observed in plants.