Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms
Rob Patro (1), Stephen M. Mount (2), Carl Kingsford (1) ((1) Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, (2) Department of Cell Biology and Molecular Genetics and Center for Bioinformatics and Computational Biology, University of Maryland)
(Submitted on 16 Aug 2013)
RNA-seq has rapidly become the de facto technique to measure gene expression. However, the time required for analysis has not kept up with the pace of data generation. Here we introduce Sailfish, a novel computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Sailfish entirely avoids mapping reads, which is a time-consuming step in all current methods. Sailfish provides quantification estimates much faster than existing approaches (typically 20-times faster) without loss of accuracy.
Sailfish, the “alignment-free” method is really a neat approach to estimate isoform quantitation from RNA-seq data. I have posted a quick summary/comment on Sailfish here at
http://nextgenseek.com/2013/08/sailfish-isoform-quantitation-at-the-speed-of-making-a-cup-of-coffee/
Hope it is helpful.
See also Lior Pachter’s commentary here: http://liorpachter.wordpress.com/2013/08/20/sailfish/
Naively, given that Sailfish does not explicitly use the added specificity from long reads or mates in paired-end data, it would be interesting to see its performance on the synthetic data as a function of read length +/- pairing.
We suspect that the accuracy of Sailfish is due to a tradeoff between not using information about the co-occurrence of k-mers within a single read (or paired end information) and bypassing problems with read mapping (e.g. what to do about reads that map multiple places. and how much and what sort of errors to tolerate). As people start to use the tool, we are interested in hearing about performance on different data types. I’m thinking that important variables will be the quality of the transcriptome annotation and the extent (and range) of sequence divergence between the reference transcriptome and the sample, but there may be others. Also, we don’t see much sensitivity to the value of k, but that could also be different with real data. We have set up a user’s group – sailfish-users on googlegroups.com (short URL http://ongen.us/SForum) – where people can share results.
Pingback: Most viewed on Haldane’s Sieve: August 2013 | Haldane's Sieve
Pingback: Some preprint comment streams at Haldane’s sieve and related sites | Haldane's Sieve
Pingback: Sifting through 2013 with Haldane’s Sieve | Haldane's Sieve
Pingback: Sailfish: Alignment-free isoform quantification from RNA-seq reads