Species Identification by Bayesian Fingerprinting: A Powerful Alternative to DNA Barcoding
A number of methods have been developed to use genetic sequence data to identify and delineate species. Some methods are based on heuristics, such as DNA barcoding which is based on a sequence-distance threshold, while others use Bayesian model comparison under the multispecies coalescent model. Here we use mathematical analysis and computer simulation to demonstrate large differences in statistical performance of species identification between DNA barcoding and Bayesian inference under the multispecies coalescent model as implemented in the bpp program. We show that a fixed genetic-distance threshold as used in DNA barcoding is problematic for delimiting species, even if the threshold is “optimized”, because different species have different population sizes and different divergence times, and therefore display different amounts of intra-species versus inter-species variation. In contrast, bpp can reliably delimit species in such situations with only one locus and rarely supports a wrong assignment with high posterior probability. While under-sampling or rare specimens may pose problems for heuristic methods, bpp can delimit species with high power when multi-locus data are used, even if the species is represented by a single specimen. Finally we demonstrate that bpp may be powerful for delimiting cryptic species using specimens that are misidentified as a single species in the barcoding library.