Landscape and evolutionary dynamics of terminal-repeat retrotransposons in miniature (TRIMs) in 48 whole plant genomes

Landscape and evolutionary dynamics of terminal-repeat retrotransposons in miniature (TRIMs) in 48 whole plant genomes
Dongying Gao, Yupeng Li, Brian Abernathy, Scott Jackson
doi: http://dx.doi.org/10.1101/010850

Terminal-repeat retrotransposons in miniature (TRIMs) are structurally similar to long terminal repeat (LTR) retrotransposons except that they are extremely small and difficult to identify. Thus far, only a few TRIMs have been characterized in the euphyllophytes and the evolutionary and biological impacts and transposition mechanism of TRIMs are poorly understood. In this study, we combined de novo and homology-based methods to annotate TRIMs in 48 plant genome sequences, spanning land plants to algae. We found 156 TRIM families, 146 previously undescribed. Notably, we identified the first TRIMs in a lycophyte and non-vascular plants. The majority of the TRIM families were highly conserved and shared within and between plant families. Even though TRIMs contribute only a small fraction of any plant genome, they are enriched in or near genes and may play important roles in gene evolution. TRIMs were frequently organized into tandem arrays we called TA-TRIMs, another unique feature distinguishing them from LTR retrotransposons. Importantly, we identified putative autonomous retrotransposons that may mobilize specific TRIM elements and detected very recent transpositions of a TRIM in O. sativa. Overall, this comprehensive analysis of TRIMs across the entire plant kingdom provides insight into the evolution and conservation of TRIMs and the functional roles they may play in gene evolution.

Sharing and specificity of co-expression networks across 35 human tissues

Sharing and specificity of co-expression networks across 35 human tissues
Emma Pierson, GTEx Consortium, Daphne Koller, Alexis Battle, Sara Mostafavi
doi: http://dx.doi.org/10.1101/010843

To understand the regulation of tissue-specific gene expression, the GTEx Consortium generated RNA-seq expression data for more than thirty distinct human tissues. This data provides an opportunity for deriving shared and tissue-specific gene regulatory networks on the basis of co-expression between genes. However, a small number of samples are available for a majority of the tissues, and therefore statistical inference of networks in this setting is highly underpowered. To address this problem, we infer tissue-specific gene co-expression networks for 35 tissues in the GTEx dataset using a novel algorithm, GNAT, that uses a hierarchy of tissues to share data between related tissues. We show that this transfer learning approach increases the accuracy with which networks are learned. Analysis of these networks reveals that tissue-specific transcription factors are hubs that preferentially connect to genes with tissue-specific functions. Additionally, we observe that genes with tissue-specific functions lie at the peripheries of our networks. We identify numerous modules enriched for Gene Ontology functions, and show that modules conserved across tissues are especially likely to have functions common to all tissues, while modules that are upregulated in a particular tissue are often instrumental to tissue-specific function. Finally, we provide a web tool which allows exploration of gene function and regulation in a tissue-specific manner.

Exploring the phenotypic space and the evolutionary history of a natural mutation in Drosophila melanogaster

Exploring the phenotypic space and the evolutionary history of a natural mutation in Drosophila melanogaster
Anna Ullastres, Natalia Petit, Josefa González
doi: http://dx.doi.org/10.1101/010918

A major challenge of modern Biology is elucidating the functional consequences of natural mutations. While we have a good understanding of the effects of lab-induced mutations on the molecular- and organismal-level phenotypes, the study of natural mutations has lagged behind. In this work, we explore the phenotypic space and the evolutionary history of a previously identified adaptive transposable element insertion. We first combined several tests that capture different signatures of selection to show that there is evidence of positive selection in the regions flanking FBti0019386 insertion. We then explored several phenotypes related to known phenotypic effects of nearby genes, and having plausible connections to fitness variation in nature. We found that flies with FBti0019386 insertion had a shorter developmental time and were more sensitive to stress, which are likely to be the adaptive effect and the cost of selection of this mutation, respectively. Interestingly, these phenotypic effects are not consistent with a role of FBti0019386 in temperate adaptation as has been previously suggested. Indeed, a global analysis of the population frequency of FBti0019386 showed that clinal frequency patterns are found in North America and Australia but not in Europe. Finally, we showed that FBti0019386 is associated with down-regulation of sra most likely because it induces the formation of heterochromatin by recruiting HP1a protein. Overall, our integrative approach allowed us to shed light on the evolutionary history, the relevant fitness effects and the likely molecular mechanisms of an adaptive mutation and highlights the complexity of natural genetic variants.

Impacts of terraces on phylogenetic inference

Impacts of terraces on phylogenetic inference
Michael J Sanderson, Michelle M. McMahon, Alexandros Stamatakis, Derrick J. Zwickl, Mike Steel
Comments: 50 pages, 9 figures
Subjects: Populations and Evolution (q-bio.PE)

Terraces are potentially large sets of trees with precisely the same likelihood or parsimony score, which can be induced by missing sequences in partitioned multi-locus phylogenetic data matrices. The set of trees on a terrace can be characterized by enumeration algorithms or consensus methods that exploit the pattern of partial taxon coverage in the data, independent of the sequence data themselves. Terraces add ambiguity and complexity to phylogenetic inference particularly in settings where inference is already challenging: data sets with many taxa and relatively few loci. In this paper we present five new findings about terraces and their impacts on phylogenetic inference. First we clarify assumptions about model parameters that are necessary for the existence of terraces. Second, we explore the dependence of terrace size on partitioning scheme and indicate how to find the partitioning scheme associated with the largest terrace containing a given tree. Third, we highlight the impact of terraces on bootstrap estimates of confidence limits in clades, and characterize the surprising result that the bootstrap proportion for a clade can be entirely determined by the frequency of bipartitions on a terrace, with some bipartitions receiving high support even when incorrect. Fourth, we dissect some effects of prior distributions of edge lengths on the computed posterior probabilities of clades on terraces, to understand an example in which long edges “attract” each other in Bayesian inference. Fifth, we show that even if data are not partitioned, patterns of missing data studied in the terrace problem can lead to instances of apparent statistical inconsistency when even a small element of heterotachy is introduced to the model generating the sequence data. Finally, we discuss strategies for remediation of some of these problems.

CNVkit: Copy number detection and visualization for targeted sequencing using off-target reads

CNVkit: Copy number detection and visualization for targeted sequencing using off-target reads
Eric Talevich, A. Hunter Shain, Boris C. Bastian
doi: http://dx.doi.org/10.1101/010876

Germline copy number variants (CNVs) and somatic copy number alterations (SCNAs) are of significant importance in syndromic conditions and cancer. Massive parallel sequencing is increasingly used to infer copy number information from variations in the read depth in sequencing data. However, this approach has limitations in the case of targeted re-sequencing, which leaves gaps in coverage between the regions chosen for enrichment and introduces biases related to the efficiency of target capture and library preparation. We present a method for copy number detection, implemented in the software package CNVkit, that uses both the targeted reads and the nonspecifically captured off-target reads to infer copy number evenly across the genome. This combination achieves both exon-level resolution in targeted regions and sufficient resolution in the larger intronic and intergenic regions to identify copy number changes. In particular, we successfully inferred copy number at equivalent to 100-kilobase resolution genome-wide from a platform targeting as few as 293 genes. After normalizing read counts to a pooled reference, we evaluated and corrected for three sources of bias that explain most of the extraneous variability in the sequencing read depth: GC content, target footprint size and spacing, and repetitive sequences. We compared the performance of CNVkit to copy number changes identified by array comparative genomic hybridization. We packaged the components of CNVkit so that it is straightforward to use and provides visualizations, detailed reporting of significant features, and export options for compatibility with other software. Availability: http://github.com/etal/cnvkit

Genome-wide association study of carbon and nitrogen metabolism in the maize nested association mapping population

Genome-wide association study of carbon and nitrogen metabolism in the maize nested association mapping population
Nengyi Zhang, Yves Gibon, Nicholas Lepak, Pinghua Li, Lauren Dedow, Charles Chen, Yoon-Sup So, Jason Wallace, Karl Kremling, Peter Bradbury, Thomas Brutnell, Mark Stitt, Edward Buckler
doi: http://dx.doi.org/10.1101/010785

Carbon (C) and nitrogen (N) metabolism are critical to plant growth and development and at the basis of yield and adaptation. We have applied high throughput metabolite analyses to over 12,000 diverse field grown samples from the maize nested association mapping population. This allowed us to identify natural variation controlling the levels of twelve key C and N metabolites, often with single gene resolution. In addition to expected genes like invertases, critical natural variation was identified in key C4 metabolism genes like carbonic anhydrases and a malate transporter. Unlike prior maize studies, extensive pleiotropy was found for C and N metabolites. This integration of field-derived metabolite data with powerful mapping and genomics resources allows dissection of key metabolic pathways, providing avenues for future genetic improvement.

Introns structure patterns of variation in nucleotide composition in Arabidopsis thaliana and rice protein-coding genes

Introns structure patterns of variation in nucleotide composition in Arabidopsis thaliana and rice protein-coding genes
Adrienne Ressayre, Sylvain Glemin, Pierre Montalent, Laurana Serres-Giardi, Christine Dillmann, Johann Joets
doi: http://dx.doi.org/10.1101/010819

Plant genomes are large, intron-rich and present a wide range of variation in coding region G+C content. Concerning coding regions, a sort of syndrome can be described in plants: the increase in G+C content is associated with both the increase in heterogeneity among genes within a genome and the increase in variation across genes. Taking advantage of the large number of genes composing plant genomes and the wide range of variation in gene intron number, we performed a comprehensive survey of the patterns of variation in G+C content at different scales from the nucleotide level to the genome scale in two species Arabidopsis thaliana and Oryza sativa, comparing the patterns in genes with different intron numbers. In both species, we observed a pervasive effect of gene intron number and location along genes on G+C content, codon and amino acid frequencies suggesting that in both species, introns have a barrier effect structuring G+C content along genes. In external gene regions (located upstream first or downstream last intron), species-specific factors are shaping G+C content while in internal gene regions (surrounded by introns), G+C content is constrained to remain within a range common to both species. In rice, introns appear as a major determinant of gene G+C content while in A. thaliana introns have a weaker but significant effect. The structuring effect of introns in both species is susceptible to explain the G+C content syndrome observed in plants.

Genetic Studies of Physiological Traits with Their Application to Sleep Apnea

Genetic Studies of Physiological Traits with Their Application to Sleep Apnea

D.Y. Lee, C. Hanis, G.I. Bell, D.A. Aguilar, S. Redline, J. Below, M.M. Xiong
(Submitted on 27 Oct 2014)

Advances of modern sensing and sequencing technologies generate a deluge of high dimensional space-temporal physiological and next-generation sequencing (NGS) data. Physiological traits are observed either as continuous random functions, or on a dense grid and referred to as function-valued traits. Both physiological and NGS data are highly correlated data with their inherent order, spacing, and functional nature which are ignored by traditional summary-based univariate and multivariate regression methods designed for quantitative genetic analysis of scalar trait and common variants. To capture morphological and dynamic features of the data and utilize their dependent structure, we propose a functional linear model (FLM) in which a trait curve is modeled as a response function, the genetic variation in a genomic region or gene is modeled as a functional predictor, and the genetic effects are modeled as a function of both time and genomic position (FLMF) for genetic analysis of function-valued trait with both GWAS and NGS data. By extensive simulations, we demonstrate that the FLMF has the correct type 1 error rates and much higher power to detect association than the existing methods. The FLMF is applied to sleep data from Starr County health studies where oxygen saturation were measured in 22,670 seconds on average for 833 individuals. We found 65 genes that were significantly associated with oxygen saturation functional trait with P-values ranging from 2.40E-06 to 2.53E-21. The results clearly demonstrate that the FLMF substantially outperforms the traditional genetic models with scalar trait.

Extensive capsule locus variation and large-scale genomic recombination within the Klebsiella pneumoniae clonal complex 258/11.

Extensive capsule locus variation and large-scale genomic recombination within the Klebsiella pneumoniae clonal complex 258/11

Kelly L Wyres, Claire Gorrie, David J Edwards, Heiman FL Wertheim, Li Yang Hsu, Nguyen Van Kinh, Ruth Zadoks, Stephen Baker, Kathryn E Holt
doi: http://dx.doi.org/10.1101/010769

Klebsiella pneumoniae clonal complex (CC) 258/11, comprising sequence types (STs) 258, 11 and closely related STs, is associated with dissemination of the K. pneumoniae carbapenemase (KPC). Hospital outbreaks of KPC CC258/11 infections have been observed globally and are very difficult to treat. As a consequence there is renewed interest in alternative infection control measures such as vaccines and phage or depolymerase treatments targeting the K pneumoniae polysaccharide capsule. To date, 78 immunologically distinct capsule variants have been described in K. pneumoniae. Previous investigations of ST258 and a small number of closely related strains suggested capsular variation was limited within this clone; only two distinct ST258 capsular synthesis (cps) loci have been identified, both acquired through large-scale recombination events (>50 kbp). Here we report comparative genomic analysis of the broader K. pneumoniae CC258/11. Our data indicate that several large-scale recombination events have shaped the genomes of CC258/11, and that definition of the complex should be broadened to include ST395 (also reported to harbour KPC). We identified 11 different cps loci within CC258/11, suggesting that capsular switching is actually common within the complex. We also observed several insertion sequences (IS) within the cps loci, and show further diversification of two loci through IS activity. These findings suggest the capsular loci of clinically important K. pneumoniae are under diversifying selection, which alters our understanding of the evolution of this important clone and has implications for the design of control measures targeting the capsule.

Multicellularity makes cellular differentiation evolutionarily stable

Multicellularity makes cellular differentiation evolutionarily stable
Mary Elizabeth Wahl, Andrew Wood Murray
doi: http://dx.doi.org/10.1101/010728

Multicellularity and cellular differentiation, two traits shared by all developing organisms, have evolved independently in many taxa and are often found together in extant species. Differentiation, which we define as a permanent and heritable change in gene expression, produces somatic cells from a totipotent germ line. Though somatic cells may divide indefinitely, they cannot reproduce the complete organism and are thus effectively sterile on long timescales. How has differentiation evolved, repeatedly, despite the fitness costs of producing non-reproductive cells? The absence of extant unicellular differentiating species, as well as the persistence of undifferentiated multicellular groups among the volvocine algae and cyanobacteria, have fueled speculation that multicellularity must arise before differentiation can evolve. We propose that unicellular differentiating populations are intrinsically susceptible to invasion by non-differentiating mutants (“cheats”), whose spread eventually drives differentiating lineages extinct. To directly compare organisms which differ only in the presence or absence of these traits, we engineered both multicellularity and cellular differentiation in budding yeast, including such essential features as irreversible conversion, reproductive division of labor, and clonal multicellularity. We find that non-differentiating mutants overtake unicellular populations but are outcompeted effectively by multicellular differentiating strains, suggesting that multicellularity evolved before differentiation.