mTim: Rapid and accurate transcript reconstruction from RNA-Seq data


mTim: Rapid and accurate transcript reconstruction from RNA-Seq data

Georg Zeller, Nico Goernitz, Andre Kahles, Jonas Behr, Pramod Mudrakarta, Soeren Sonnenburg, Gunnar Raetsch
(Submitted on 20 Sep 2013)

Recent advances in high-throughput cDNA sequencing (RNA-Seq) technology have revolutionized transcriptome studies. A major motivation for RNA-Seq is to map the structure of expressed transcripts at nucleotide resolution. With accurate computational tools for transcript reconstruction, this technology may also become useful for genome (re-)annotation, which has mostly relied on de novo gene finding where gene structures are primarily inferred from the genome sequence. We developed a machine-learning method, called mTim (margin-based transcript inference method) for transcript reconstruction from RNA-Seq read alignments that is based on discriminatively trained hidden Markov support vector machines. In addition to features derived from read alignments, it utilizes characteristic genomic sequences, e.g. around splice sites, to improve transcript predictions. mTim inferred transcripts that were highly accurate and relatively robust to alignment errors in comparison to those from Cufflinks, a widely used transcript assembly method.

Change point analysis of histone modifications reveals epigenetic blocks with distinct regulatory activity and biological functions

Change point analysis of histone modifications reveals epigenetic blocks with distinct regulatory activity and biological functions
Mengjie Chen, Haifan Lin, Hongyu Zhao
(Submitted on 20 Sep 2013)

Histone modification is a vital epigenetic mechanism for transcriptional control in eukaryotes. High-throughput techniques have enabled whole-genome analysis of histone modifications in recent years. However, most studies assume one combination of histone modification invariantly translates to one transcriptional output regardless of local chromatin environment. In this study we hypothesize that, the genome is organized into local domains that manifest similar enrichment pattern of histone modification, which leads to orchestrated regulation of expression of genes with relevant biological functions. We propose a multivariate Bayesian Change Point (BCP) model to segment the Drosophila melanogaster genome into consecutive blocks on the basis of combinatorial patterns of histone marks. By modeling the sparse distribution of histone marks across the chromosome with a zero-inflated Gaussian mixture, our partitions capture local BLOCKs manifest relatively homogeneous enrichment pattern of histone modifications. We further characterized BLOCKs by their transcription levels, distribution of genes, binding profiles of a broad panel of chromatin proteins, degree of co-expression and GO enrichment. Our results demonstrate that these blocks, although inferred merely from histone modifications, reveal strong relevance with transcription events and chromatin organization, which suggest their important roles in coordinated gene regulation.

A Gene Regulatory Model of heterosis and speciation

A Gene Regulatory Model of heterosis and speciation
Peter M. F. Emmrich, Vera Pancaldi, Hannah E. Roberts, Krystyna A. Kelly, David C. Baulcombe
(Submitted on 15 Sep 2013)

Crossing individuals from genetically distinct populations often results in improvements in quantitative traits, such as growth rate, biomass production and stress resistance; this phenomenon is known as heterosis. We have taken a computational approach to explore the mechanisms underlying heterosis, developing a simulation of evolution and hybridization of Gene Regulatory Networks (GRNs) in a Boolean framework. These artificial regulatory networks exhibit biologically realistic topological properties and fitness is measured as the ability of a network to respond to external inputs in the correct way. Our model reproduced experimental observations from the literature on heterosis using only biologically meaningful parameters, such as mutation rates. Hybrid vigor was observed, its extent was seen to increase as parental populations diverged until it collapses when the two populations have become incompatible. Thus, the model also describes a process of speciation and links it to collapsing hybrid fitness due to genetic incompatibility of the separated populations. We also reproduce for the first time in a model the fact that hybrid vigor cannot easily be fixed by crossing hybrids, which is currently an important drawback of the use of hybrid crops. The simulation allows us to study the effects of three standard models for the genetic basis of heterosis, dominance, over-dominance, and epistasis. In our simulation over-dominance is the main factor contributing to hybrid vigour, whereas under-dominance and epistatic incompatibility are responsible for the fitness collapse. As the parental populations diverge, a single mutation can determine an almost sudden incompatibility leading to low fitness hybrids.

Fast Approximate Inference of Transcript Expression Levels from RNA-seq Data


Fast Approximate Inference of Transcript Expression Levels from RNA-seq Data

James Hensman, Peter Glaus, Antti Honkela, Magnus Rattray
(Submitted on 27 Aug 2013)

Motivation: The mapping of RNA-seq reads to their transcripts of origin is a fundamental task in transcript expression estimation and differential expression scoring. Where ambiguities in mapping exist due to transcripts sharing sequence, e.g. alternative isoforms or alleles, the problem becomes an instance of non-trivial probabilistic inference. Bayesian inference in such a problem is intractable and approximate methods must be used such as Markov chain Monte Carlo (MCMC) and Variational Bayes. Standard implementations of these methods can be prohibitively slow for large datasets and complex gene models.
Results: We propose an approximate inference scheme based on Variational Bayes applied to an existing model of transcript expression inference from RNA-seq data. We apply recent advances in Variational Bayes algorithmics to improve the convergence of the algorithm beyond the standard variational expectation-maximisation approach. We apply our algorithm to simulated and biological datasets, demonstrating that the increase in speed requires only a small trade-off in accuracy of expression level estimation.
Availability: The methods were implemented in R and C++, and are available as part of the BitSeq project at this https URL The methods will be made available through the BitSeq Bioconductor package at the next stable release.

Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms

Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms
Rob Patro (1), Stephen M. Mount (2), Carl Kingsford (1) ((1) Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, (2) Department of Cell Biology and Molecular Genetics and Center for Bioinformatics and Computational Biology, University of Maryland)
(Submitted on 16 Aug 2013)

RNA-seq has rapidly become the de facto technique to measure gene expression. However, the time required for analysis has not kept up with the pace of data generation. Here we introduce Sailfish, a novel computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Sailfish entirely avoids mapping reads, which is a time-consuming step in all current methods. Sailfish provides quantification estimates much faster than existing approaches (typically 20-times faster) without loss of accuracy.

Realistic simulations reveal extensive sample-specificity of RNA-seq biases

Realistic simulations reveal extensive sample-specificity of RNA-seq biases
Botond Sipos, Greg Slodkowicz, Tim Massingham, Nick Goldman
(Submitted on 14 Aug 2013)

In line with the importance of RNA-seq, the bioinformatics community has produced numerous data analysis tools incorporating methods to correct sample-specific biases. However, few advanced simulation tools exist to enable benchmarking of competing correction methods. We introduce the first framework to reproduce the properties of individual RNA-seq runs and, by applying it on several datasets, we demonstrate the importance of accounting for sample-specificity in realistic simulations.

Cell-cycle regulated transcription associates with DNA replication timing in yeast and human

Cell-cycle regulated transcription associates with DNA replication timing in yeast and human
Hunter B. Fraser
(Submitted on 8 Aug 2013)

Eukaryotic DNA replication follows a specific temporal program, with some genomic regions consistently replicating earlier than others, yet what determines this program is largely unknown. Highly transcribed regions have been observed to replicate in early S-phase in all plant and animal species studied to date, but this relationship is thought to be absent from both budding yeast and fission yeast. No association between cell-cycle regulated transcription and replication timing has been reported for any species. Here I show that in budding yeast, fission yeast, and human, the genes most highly transcribed during S-phase replicate early, whereas those repressed in S-phase replicate late. Transcription during other cell-cycle phases shows either the opposite correlation with replication timing, or no relation. The relationship is strongest near late-firing origins of replication, which is not consistent with a previously proposed model — that replication timing may affect transcription — and instead suggests a potential mechanism involving the recruitment of limiting replication initiation factors during S-phase. These results suggest that S-phase transcription may be an important determinant of DNA replication timing across eukaryotes, which may explain the well-established association between transcription and replication timing.

A novel test for selection on cis-regulatory elements reveals positive and negative selection acting on mammalian transcriptional enhancers

A novel test for selection on cis-regulatory elements reveals positive and negative selection acting on mammalian transcriptional enhancers
Justin D. Smith, Kimberly F. McManus, Hunter B. Fraser
(Submitted on 7 Aug 2013)

Measuring natural selection on genomic elements involved in the cis-regulation of gene expression — such as transcriptional enhancers and promoters — is critical for understanding the evolution of genomes, yet it remains a major challenge. Many studies have attempted to detect positive or negative selection in these noncoding elements by searching for those with the fastest or slowest rates of evolution, but this can be problematic. Here we introduce a new approach to this issue, and demonstrate its utility on three mammalian transcriptional enhancers. Using results from saturation mutagenesis studies of these enhancers, we classified all possible point mutations as up-regulating, down-regulating, or silent, and determined which of these mutations have occurred on each branch of a phylogeny. Applying a framework analogous to Ka/Ks in protein-coding genes, we measured the strength of selection on up-regulating and down-regulating mutations, in specific branches as well as entire phylogenies. We discovered distinct modes of selection acting on different enhancers: while all three have experienced negative selection against down-regulating mutations, the selection pressures on up-regulating mutations vary. In one case we detected positive selection for up-regulation, while the other two had no detectable selection on up-regulating mutations. Our methodology is applicable to the growing number of saturation mutagenesis data sets, and provides a detailed picture of the mode and strength of natural selection acting on cis-regulatory elements.

The molecular mechanism of a cis-regulatory adaptation in yeast

The molecular mechanism of a cis-regulatory adaptation in yeast
Jessica Chang, Yiqi Zhou, Xiaoli Hu, Lucia Lam, Cameron Henry, Erin M. Green, Ryosuke Kita, Michael S. Kobor, Hunter B. Fraser
(Submitted on 7 Aug 2013)

Despite recent advances in our ability to detect adaptive evolution involving the cis-regulation of gene expression, our knowledge of the molecular mechanisms underlying these adaptations has lagged far behind. Across all model organisms the causal mutations have been discovered for only a handful of gene expression adaptations, and even for these, mechanistic details (e.g. the trans-regulatory factors involved) have not been determined. We previously reported a polygenic gene expression adaptation involving down-regulation of the ergosterol biosynthesis pathway in the budding yeast Saccharomyces cerevisiae. Here we investigate the molecular mechanism of a cis-acting mutation affecting a member of this pathway, ERG28. We show that the causal mutation is a two-base deletion in the promoter of ERG28 that strongly reduces the binding of two transcription factors, Sok2 and Mot3, thus abolishing their regulation of ERG28. This down-regulation increases resistance to a widely used antifungal drug targeting ergosterol, similar to mutations disrupting this pathway in clinical yeast isolates. The identification of the causal genetic variant revealed that the selection likely occurred after the deletion was already present at high frequency in the population, rather than when it was a new mutation. These results provide a detailed view of the molecular mechanism of a cis-regulatory adaptation, and underscore the importance of this view to our understanding of evolution at the molecular level.

Comprehensive analysis of imprinted genes in maize reveals limited conservation with other species and allelic variation for imprinting

Comprehensive analysis of imprinted genes in maize reveals limited conservation with other species and allelic variation for imprinting
Amanda J. Waters, Paul Bilinski, Steve R. Eichten, Matthew W. Vaughn, Jeffrey Ross-Ibarra, Mary Gehring, Nathan M. Springer
(Submitted on 29 Jul 2013)

In plants, a subset of genes exhibit imprinting in endosperm tissue such that expression is primarily from the maternal or paternal allele. Imprinting may arise as a consequence of mechanisms for silencing of transposons during reproduction, and in some cases imprinted expression of particular genes may provide a selective advantage such that it is conserved across species. Separate mechanisms for the origin of imprinted expression patterns and maintenance of these patterns may result in substantial variation in the targets of imprinting in different species. Here we present deep sequencing of RNAs isolated from reciprocal crosses of four diverse maize genotypes, providing a comprehensive analysis of imprinting in maize that allows evaluation of imprinting at more than 95% of endosperm-expressed genes. We find that over 500 genes exhibit statistically significant parent-of-origin effects in maize endosperm tissue, but focused our analyses on a subset of these genes that had >90% expression from the maternal allele (69 genes) or from the paternal allele (108 genes) in at least one reciprocal cross. Over 10% of imprinted genes show evidence of allelic variation for imprinting. A comparison of imprinting in maize and rice reveals that only 13% of genes with syntenic orthologs in both species exhibit conserved imprinting. Genes that exhibit conserved imprinting in maize relative to rice have elevated dN/dS ratios compared to other imprinted genes, suggesting a history of more rapid evolution. Together, these data suggest that imprinting only has functional relevance at a subset of loci that currently exhibit imprinting in maize.