Improved ribosome-footprint and mRNA measurements provide insights into dynamics and regulation of yeast translation
David E Weinberg, Premal Shah, Stephen W Eichhorn, Jeffrey A Hussmann, Joshua B Plotkin, David P Bartel
Ribosome-footprint profiling provides genome-wide snapshots of translation, but technical challenges can confound its analysis. Here, we use improved methods to obtain ribosome-footprint profiles and mRNA abundances that more faithfully reflect gene expression in Saccharomyces cerevisiae. Our results support proposals that both the beginning of coding regions and codons matching rare tRNAs are more slowly translated. They also indicate that emergent polypeptides with as few as three basic residues within a 10-residue window tend to slow translation. With the improved mRNA measurements, the variation attributable to translational control in exponentially growing yeast was less than previously reported, and most of this variation could be predicted with a simple model that considered mRNA abundance, upstream open reading frames, cap-proximal structure and nucleotide composition, and lengths of the coding and 5′- untranslated regions. Collectively, our results reveal key features of translational control in yeast and provide a framework for executing and interpreting ribosome- profiling studies.
Salmon: Accurate, Versatile and Ultrafast Quantification from RNA-seq Data using Lightweight-Alignment
Rob Patro, Geet Duggal, Carl Kingsford
Transcript quantification is a central task in the analysis of RNA-seq data. Accurate computational methods for the quantification of transcript abundances are essential for downstream analysis. However, most existing approaches are much slower than is necessary for their degree of accuracy. We introduce Salmon, a novel method and software tool for transcript quantification that exhibits state-of-the-art accuracy while being significantly faster than most other tools. Salmon achieves this through the combined application of a two-phase inference procedure, a reduced data representation, and a novel lightweight read alignment algorithm. Salmon is written in C++11, and is available under the GPL v3 license as open-source software at https://combine-lab.github.io/salmon.
Multi Loci Phylogenetic Analysis with Gene Tree Clustering
Ruriko Yoshida, Kenji Fukumizu
(Submitted on 26 Jun 2015)
Theory and empirical evidence clearly indicate that phylogenies (trees) of different genes (loci) should not display precisely matched topologies. The main reason for such phylogenetic incongruence is reticulated evolutionary history of most species due to meiotic sexual recombination in eukaryotes, orhorizontal transfers of genetic materials in prokaryotes. Nevertheless, most genes should display topologically related phylogenies, and should group into one or more (for genetic hybrids) clusters in the “tree space.” In this paper we propose to apply the normalized-cut (Ncut) clustering algorithm to the set of gene trees with the geodesic distance between trees over the Billera-Holmes-Vogtmann (BHV) tree space. We first show by simulated data sets that the Ncut algorithm accurately clusters the set of gene trees given a species tree under the coalescent process, and show that the Ncut algorithm works better on the gene trees reconstructed via the neighbor-joining method than these reconstructed via the maximum likelihood estimator under the evolutionary models. Moreover, we apply the methods to a genome-wide data set (1290 genes encoding 690,838 amino acid residues) on coelacanths, lungfishes, and tetrapods. The result suggests that there are two clusters in the data set. Finally we reconstruct the consensus trees from these two clusters; the consensus tree constructed from one cluster has the tree topology that coelacanths are most closely related to the tetrapods, and the consensus tree from the other includes an irresolvable trichotomy over the coelacanth, lungfish, and tetrapod lineages, suggesting divergence within a very short time interval.
Mitochondrial DNA Copy Number Variation Across Human Cancers
Ed Reznik, Martin Miller, Yasin Senbabaoglu, Nadeem Riaz, William Lee, Chris Sander
In cancer, mitochondrial dysfunction, through mutations, deletions, and changes in copy number of mitochondrial DNA (mtDNA), contributes to the malignant transformation and progression of tumors. Here, we report the first large-scale survey of mtDNA copy number variation across 21 distinct solid tumor types, examining over 13,000 tissue samples profiled with next-generation sequencing methods. We find a tendency for cancers, especially of the bladder and kidney, to be significantly depleted of mtDNA, relative to matched normal tissue. We show that mtDNA copy number is correlated to the expression of mitochondrially-localized metabolic pathways, suggesting that mtDNA copy number variation reflect gross changes in mitochondrial metabolic activity. Finally, we identify a subset of tumor-type-specific somatic alterations, including IDH1 and NF1 mutations in gliomas, whose incidence is strongly correlated to mtDNA copy number. Our findings suggest that modulation of mtDNA copy number may play a role in the pathology of cancer.
TransRate: reference free quality assessment of de-novo transcriptome assemblies
Richard D Smith-Unna, Chris Boursnell, Rob Patro, Julian M Hibberd, Steven Kelly
TransRate is a tool for reference-free quality assessment of de novo transcriptome assemblies. Using only sequenced reads as the input, TransRate measures the quality of individual contigs and whole assemblies, enabling assembly optimization and comparison. TransRate can accurately evaluate assemblies of conserved and novel RNA molecules of any kind in any species. We show that it is more accurate than comparable methods and demonstrate its use on a variety of data.
Detecting adaptive evolution in phylogenetic comparative analysis using the Ornstein-Uhlenbeck model
Clayton E. Cressler, Marguerite A. Butler, Aaron A. King
(Submitted on 25 Jun 2015)
Phylogenetic comparative analysis is an approach to inferring evolutionary process from a combination of phylogenetic and phenotypic data. The last few years have seen increasingly sophisticated models employed in the evaluation of more and more detailed evolutionary hypotheses, including adaptive hypotheses with multiple selective optima and hypotheses with rate variation within and across lineages. The statistical performance of these sophisticated models has received relatively little systematic attention, however. We conducted an extensive simulation study to quantify the statistical properties of a class of models toward the simpler end of the spectrum that model phenotypic evolution using Ornstein-Uhlenbeck processes. We focused on identifying where, how, and why these methods break down so that users can apply them with greater understanding of their strengths and weaknesses. Our analysis identifies three key determinants of performance: a discriminability ratio, a signal-to-noise ratio, and the number of taxa sampled. Interestingly, we find that model-selection power can be high even in regions that were previously thought to be difficult, such as when tree size is small. On the other hand, we find that model parameters are in many circumstances difficult to estimate accurately, indicating a relative paucity of information in the data relative to these parameters. Nevertheless, we note that accurate model selection is often possible when parameters are only weakly identified. Our results have implications for more sophisticated methods inasmuch as the latter are generalizations of the case we study.
SSCM: A method to analyze and predict the pathogenicity of sequence variants
Sharad Vikram, Matthew D Rasmussen, Eric A Evans, Imran S Haque
The advent of cost-effective DNA sequencing has provided clinics with high-resolution information about patients’ genetic variants, which has resulted in the need for efficient interpretation of this genomic data. Traditionally, variant interpretation has been dominated by many manual, time-consuming processes due to the disparate forms of relevant information in clinical databases and literature. Computational techniques promise to automate much of this, and while they currently play only a supporting role, their continued improvement for variant interpretation is necessary to tackle the problem of scaling genetic sequencing to ever larger populations. Here, we present SSCM-Pathogenic, a genome-wide, allele-specific score for predicting variant pathogenicity. The score, generated by a semi-supervised clustering algorithm, shows predictive power on clinically relevant mutations, while also displaying predictive ability in noncoding regions of the genome.