pong: fast analysis and visualization of latent clusters in population genetic data
Motivation: A series of methods in population genetics use multilocus genotype data to assign individuals membership in latent clusters. These methods belong to a broad class of mixed-membership models, such as latent Dirichlet allocation used to analyze text corpora. Inference from mixed-membership models can produce different output matrices when repeatedly applied to the same inputs, and the number of latent clusters is a parameter that is varied in the analysis pipeline. For these reasons, quantifying, visualizing, and annotating the output from mixed-membership models are bottlenecks for investigators. Results: Here, we introduce pong, a network-graphical approach for analyzing and visualizing membership in latent clusters with a D3.js interactive visualization. We apply this new method to 225,705 unlinked genome-wide single-nucleotide variants from 2,426 unrelated individuals in the 1000 Genomes Project, and show that pong outpaces current solutions by more than an order of magnitude in runtime while providing a customizable and interactive visualization of population structure that is more accurate than those produced by current tools. Availability: pong is freely available and can be installed using the Python package management system pip.