PrediXcan: Trait Mapping Using Human Transcriptome Regulation

PrediXcan: Trait Mapping Using Human Transcriptome Regulation

Eric R Gamazon, Heather E Wheeler, Kaanan Shah, Sahar V Mozaffari, Keston Aquino-Michaels, Robert J Carroll, Anne E Eyler, Joshua C Denny, Dan L Nicolae, Nancy J Cox, Hae Kyung Im, GTEx Consortium
doi: http://dx.doi.org/10.1101/020164

Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood. We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype. The approach estimates the component of gene expression determined by an individual’s genetic profile and correlates the “imputed” gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype. The genetically regulated gene expression is estimated using whole-genome tissue-dependent prediction models trained with reference transcriptome datasets. PrediXcan enjoys the benefits of gene- based approaches such as reduced multiple testing burden, more comprehensive annotation of gene function compared to that derived from single variants, and a principled approach to the design of follow-up experiments while also integrating knowledge of regulatory function. Since no actual expression data are used in the analysis of GWAS data – only in silico expression – reverse causality problems are largely avoided. PrediXcan harnesses reference transcriptome data for disease mapping studies. Our results demonstrate that PrediXcan can detect known and novel genes associated with disease traits and provide insights into the mechanism of these associations.

A Unified Architecture of Transcriptional Regulatory Elements

A Unified Architecture of Transcriptional Regulatory Elements

Robin Andersson, Albin Sandelin, Charles G Danko
doi: http://dx.doi.org/10.1101/019844

Gene expression is precisely controlled in time and space through the integration of signals that act at gene promoters and gene-distal enhancers. Classically, promoters and enhancers are considered separate classes of regulatory elements, often distinguished by histone modifications. However, recent studies have revealed broad similarities between enhancers and promoters, blurring the distinction: active enhancers often initiate transcription, and some gene promoters have the potential of enhancing transcriptional output of other promoters. Here, we propose a model in which promoters and enhancers are considered a single class of functional element, with a unified architecture for transcription initiation. The context of interacting regulatory elements, and surrounding sequences, determine local transcriptional output as well as the enhancer and promoter activities of individual elements.

Determining Exon Connectivity in Complex mRNAs by Nanopore Sequencing

Determining Exon Connectivity in Complex mRNAs by Nanopore Sequencing

Mohan Bolisetty, Gopinath Rajadinakaran, Brenton Graveley
doi: http://dx.doi.org/10.1101/019752

Though powerful, short-read high throughput RNA sequencing is limited in its ability to directly measure exon connectivity in mRNAs containing multiple alternative exons located farther apart than the maximum read lengths. Here, we use the Oxford Nanopore MinION™ sequencer to identify 7,899 ‘full-length’ isoforms expressed from four Drosophila genes, Dscam1, MRP, Mhc, and Rdl. These results demonstrate that nanopore sequencing can be used to deconvolute individual isoforms and that it has the potential to be an important method for comprehensive transcriptome characterization.

Dynamics of Wolbachia pipientis gene expression across the Drosophila melanogaster life cycle

Dynamics of Wolbachia pipientis gene expression across the Drosophila melanogaster life cycle

Florence Gutzwiller, Catarina R. Carmo, Danny E. Miller, Danny W. Rice, Irene L. Newton, Luis Teixeira, Casey M. Bergman
(Submitted on 21 May 2015)

Symbiotic interactions between microbes and their multicellular hosts have manifold impacts on molecular, cellular and organismal biology. To identify candidate bacterial genes involved in maintaining endosymbiotic associations with insect hosts, we analyzed genome-wide patterns of gene expression in the alpha-proteobacteria Wolbachia pipientis across the life cycle of Drosophila melanogaster using public data from the modENCODE project that was generated in a Wolbachia-infected version of the ISO1 reference strain. We find that the majority of Wolbachia genes are expressed at detectable levels in D. melanogaster across the entire life cycle, but that only 7.8% of 1195 Wolbachia genes exhibit robust stage- or sex-specific expression differences when studied in the “holo-organism” context. Wolbachia genes that are differentially expressed during development are typically up-regulated after D. melanogaster embryogenesis, and include many bacterial membrane, secretion system and ankyrin-repeat containing proteins. Sex-biased genes are often organised as small operons of uncharacterised genes and are mainly up-regulated in adult males D. melanogaster in an age-dependent manner suggesting a potential role in cytoplasmic incompatibility. Our results indicate that large changes in Wolbachia gene expression across the Drosophila life-cycle are relatively rare when assayed across all host tissues, but that candidate genes to understand host-microbe interaction in facultative endosymbionts can be successfully identified using holo-organism expression profiling. Our work also shows that mining public gene expression data in D. melanogaster provides a rich set of resources to probe the functional basis of the Wolbachia-Drosophila symbiosis and annotate the transcriptional outputs of the Wolbachia genome.

Near-optimal RNA-Seq quantification

Near-optimal RNA-Seq quantification
Nicolas Bray, Harold Pimentel, Páll Melsted, Lior Pachter
Subjects: Quantitative Methods (q-bio.QM); Computational Engineering, Finance, and Science (cs.CE); Data Structures and Algorithms (cs.DS); Genomics (q-bio.GN)

We present a novel approach to RNA-Seq quantification that is near optimal in speed and accuracy. Software implementing the approach, called kallisto, can be used to analyze 30 million unaligned RNA-Seq reads in less than 5 minutes on a standard laptop computer while providing results as accurate as those of the best existing tools. This removes a major computational bottleneck in RNA-Seq analysis.

A high-throughput RNA-seq approach to profile transcriptional responses

A high-throughput RNA-seq approach to profile transcriptional responses

Gregory A Moyerbrailean , Gordon O Davis , Chris T Harvey , Donovan Watza , Xiaoquan Wen , Roger Pique-Regi , Francesca Luca
doi: http://dx.doi.org/10.1101/018416

In recent years, different technologies have been used to measure genome-wide gene expression levels and to study the transcriptome across many types of tissues and in response to in vitro treatments. However, a full understanding of gene regulation in any given cellular and environmental context combination is still missing. This is partly because analyzing tissue/environment-specific gene expression generally implies screening a large number of cellular conditions and samples, without prior knowledge of which conditions are most informative (e.g. some cell types may not respond to certain treatments). To circumvent these challenges, we have established a new two-step high-throughput and cost-effective RNA-seq approach: the first step consists of gene expression screening of a large number of conditions, while the second step focuses on deep sequencing of the most relevant conditions (e.g. largest number of differentially expressed genes). This study design allows for a fast and economical screen in step one, with a more profitable allocation of resources for the deep sequencing of re-pooled libraries in step two. We have applied this approach to study the response to 26 treatments in three lymphoblastoid cell line samples and we show that it is applicable for other high-throughput transcriptome profiling requiring iterative refinement or screening.

Methods for distinguishing between protein-coding and long noncoding RNAs and the elusive biological purpose of translation of long noncoding RNAs

Methods for distinguishing between protein-coding and long noncoding RNAs and the elusive biological purpose of translation of long noncoding RNAs
Gali Housman , Igor Ulitsky
doi: http://dx.doi.org/10.1101/017889

Long noncoding RNAs (lncRNAs) are a diverse class of RNAs with increasingly appreciated functions in vertebrates, yet much of their biology remains poorly understood. In particular, it is unclear to what extent the current catalog of over 10,000 distinct annotated lncRNAs is indeed devoid of genes coding for proteins. Here we review the available computational and experimental schemes for distinguishing between recent genome-wide applications. We conclude that the model most consistent with available data is that a large number of mammalian lncRNAs undergo translation, but only a very small minority of such translation events result in stable and functional peptides. The outcome of the majority of the translation events and their potential biological purposes remain an intriguing topic for future investigation.