Cloudbreak: Accurate and Scalable Genomic Structural Variation Detection in the Cloud with MapReduce

Cloudbreak: Accurate and Scalable Genomic Structural Variation Detection in the Cloud with MapReduce
Christopher W. Whelan, Jeffrey Tyner, Alberto L’Abbate, Clelia Tiziana Storlazzi, Lucia Carbone, Kemal Sönmez
(Submitted on 9 Jul 2013)

The detection of genomic structural variations (SV) remains a difficult challenge in analyzing sequencing data, and the growing size and number of sequenced genomes have rendered SV detection a bona fide big data problem. MapReduce is a proven, scalable solution for distributed computing on huge data sets. We describe a conceptual framework for SV detection algorithms in MapReduce based on computing local genomic features, and use it to develop a deletion and insertion detection algorithm, Cloudbreak. On simulated and real data sets, Cloudbreak achieves accuracy improvements over popular SV detection algorithms, and genotypes variants from diploid samples. It provides dramatically shorter runtimes and the ability to scale to big data volumes on large compute clusters. Cloudbreak includes tools to set up and configure MapReduce (Hadoop) clusters on cloud services, enabling on-demand cluster computing. Our implementation and source code are available at this http URL

1 thought on “Cloudbreak: Accurate and Scalable Genomic Structural Variation Detection in the Cloud with MapReduce

  1. Pingback: Most viewed on Haldane’s Sieve: July 2013 | Haldane's Sieve

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s