SISRS: SNP Identification from Short Read Sequences
Rachel S. Schwartz, Kelly Harkins, Anne C. Stone, Reed A. Cartwright
(Submitted on 16 May 2013)
One of the important challenges in modern phylogenetics is to identify data that can be used to resolve species relationships accurately. Whole-genome shotgun sequencing provides large amounts of data from which to identify phylogenetically informative sites; however, previous studies have required genome assembly or alignment to a reference genome, which is difficult when species are not closely related.
We have developed a pipeline to extract potentially informative sites directly from raw short-read sequence data. Reads are assembled into conserved genome fragments, reads are then aligned to these fragments, and informative sites are identified. This pipeline produced >14000 informative sites from reads for 12 species of Leishmania and a reference genome. When analyzed using standard phylogenetic methods, these data resulted in a fully bifurcating tree with strongly supported nodes.
Our procedure is implemented in the software SISRS (pronounced “scissors”) which is freely available at this https URL.