This is a guest post by Yarden Katz [@yardenkatz] on his paper (along with coauthors): katz et al. Sashimi plots: Quantitative visualization of RNA sequencing read alignments arXived here
A first draft of our paper Sashimi plots: Quantitative visualization of RNA sequencing read alignments is now available. Sashimi plots are a simple visualization of RNA sequencing data, intended to make it easier to detect differentially spliced exons across multiple RNA-Seq samples. In a Sashimi plot, RNA-Seq reads are summarized as read densities, and junction reads are collapsed into arcs whose width is proportional to the number of reads spanning the exons connected by the arc. See the paper for examples.
We call it a Sashimi plot in part because of the impeccable resemblance of bumpy RNA-Seq read densities in exons to small pieces of Sashimi, and also because we tried to keep the plots as close to the “raw” data as possible. While Sashimi plots can display estimates of isoform abundance levels from programs like MISO, the goal here was to summarize the read alignments as they are, without further processing or inference, so that conclusions from probabilistic models can be visually verified.
The original Sashimi plot program is a command line utility that makes customizable Sashimi plots using Python (using the matplotlib library). Recently, the IGV genome browser team implemented a version of Sashimi plots in their browser (see installation instructions.) This allows Sashimi plots to be made dynamically for any genomic region of interest, at a resolution set by the zoom in/out features of the browser. The plot can be made for all or a subset of the tracks loaded, and the scales can be adjusted by the user as in the main IGV window. Both the static, Python-based version of Sashimi plots and the dynamic version within IGV are available and actively maintained, and code bases for both are available on GitHub.
Sashimi plots still have important limitations. First, the junction arcs can get messy for genes with many alternative isoforms. This can be partially addressed by looking at simplified event annotations (e.g. ones containing only two isoforms, or a handful of isoforms, as in these annotations) rather than making plots for the full set of isoforms of a gene. The second limitation is that sometimes subtle differences are not readily seen from junction arc widths. We’re considering alternative representations (such as circle area or diameter) for quantitatively representing junction read counts.
The paper is meant primarily as advertisement for the software. We hope that other members of the RNA processing/sequencing community will find this useful and come up with their own variants of these plots.
Relevant links:
- The Sashimi plot manual is here: http://genes.mit.edu/burgelab/miso/docs/sashimi.html
- GitHub repository for IGV/IGV-Sashimi: IGV at GitHub
- GitHub repository for Python, static Sashimi plots: Sashimi plot at GitHub
Pingback: Most viewed on Haldane’s Sieve: June 2013 | Haldane's Sieve
Pingback: Sifting through 2014 on Haldane’s Sieve | Haldane's Sieve