BGT: efficient and flexible genotype query across many samples Heng Li
Subjects: Genomics (q-bio.GN)
Summary: BGT is a compact format, a fast command line tool and a simple web application for efficient and convenient query of whole-genome genotypes and frequencies across tens to hundreds of thousands of samples. On real data, it encodes the haplotypes of 32,488 samples across 39.2 million SNPs into a 7.4GB database and decodes a couple of hundred million genotypes per CPU second. The high performance enables real-time responses to complex queries.
Availability and implementation: https://github.com/lh3/bgt
Contact: hengli@broadinstitute.org