Evolutionary analysis across mammals reveals distinct classes of long noncoding RNAs
BACKGROUND: Recent advances in transcriptome sequencing have enabled the discovery of thousands of long non-coding RNAs (lncRNAs) across multitudes of species. Though several lncRNAs have been shown to play important roles in diverse biological processes, the functions and mechanisms of most lncRNAs remain unknown. Two significant obstacles lie between transcriptome sequencing and functional characterization of lncRNAs: 1) identifying truly noncoding genes from de novo reconstructed transcriptomes, and 2) prioritizing hundreds of resulting putative lncRNAs from each sample for downstream experimental interrogation. RESULTS: We present slncky, a computational lncRNA discovery tool that produces a high-quality set of lncRNAs from RNA-Sequencing data and further prioritizes lncRNAs by characterizing selective constraint as a proxy for function. Our filtering pipeline is comparable to manual curation efforts and more sensitive than previously published approaches. Further, we develop, for the first time, a sensitive alignment pipeline for aligning lncRNA loci and propose new evolutionary metrics relevant for both sequence and transcript evolution. Our analysis reveals that selection acts in several distinct patterns, and uncovers two notable classes of lncRNAs: one showing strong purifying selection at RNA sequence and another where constraint is restricted to the regulation but not the sequence of the transcript. CONCLUSION: Our novel comparative methods for lncRNAs reveals 233 constrained lncRNAs out of tens of thousands of currently annotated transcripts, which we believe should be prioritized for further interrogation. To aid in their analysis we provide the slncky Evolution Browser as a resource for experimentalists.