Sensitive Long-Indel-Aware Alignment of Sequencing Reads
Tobias Marschall, Alexander Schönhuth
(Submitted on 14 Mar 2013)
The tremdendous advances in high-throughput sequencing technologies have made population-scale sequencing as performed in the 1000 Genomes project and the Genome of the Netherlands project possible. Next-generation sequencing has allowed genom-wide discovery of variations beyond single-nucleotide polymorphisms (SNPs), in particular of structural variations (SVs) like deletions, insertions, duplications, translocations, inversions, and even more complex rearrangements. Here, we design a read aligner with special emphasis on the following properties: (1) high sensitivity, i.e. find all (reasonable) alignments; (2) ability to find (long) indels; (3) statistically sound alignment scores; and (4) runtime fast enough to be applied to whole genome data. We compare performance to BWA, bowtie2, stampy and find that our methods is especially advantageous on reads containing larger indels.
Is the implementation available for testing?
We’ve already used it for re-aligning the reads from the 250 families (= 60TB data) of the Genome-of-the-Netherlands project. That means it works also large-scale. Small steps away from distributing it.
We just released v2.0rc1 of the CLEVER Toolkit, including this read mapper (which is now named LASER). We appreciate any feedback! Link: http://code.google.com/p/clever-sv
Pingback: Most viewed on Haldane’s Sieve, March 2013 | Haldane's Sieve