SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads
Yinlong Xie, Gengxiong Wu, Jingbo Tang, Ruibang Luo, Jordan Patterson, Shanlin Liu, Weihua Huang, Guangzhu He, Shengchang Gu, Shengkang Li, Xin Zhou, Tak-Wah Lam, Yingrui Li, Xun Xu, Gane Ka-Shu Wong, Jun Wang
(Submitted on 29 May 2013)
Motivation: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining the sequences of many (but not all) of the genes from an organism with no reference genome. With the rapidly increasing throughputs and decreasing costs of next generation sequencing, RNA-Seq has gained in popu-larity; but given the short reads (e.g. 2 * 90 bp paired ends), de novo assembly to recover complete full length gene sequences remains an algorithmic challenge.
Results: We present SOAPdenovo-Trans, a de novo transcriptome assembler designed specifically for RNA-Seq. Its performance was evaluated on 2Gb and 5Gb of transcriptome data from mouse and rice. Using the known transcripts from these two well-annotated genomes as a benchmark, we assessed how SOAPdenovo-Trans and other competing software handle the practical issues of alterna-tive splicing and variable expression levels. Compared with other de novo transcriptome assemblers, SOAPdenovo-Trans provides high-er contiguity, lower redundancy, and faster execution.