Efficient detection of repeating sites to accelerate phylogenetic likelihood calculations
The phylogenetic likelihood function is the major computational bottleneck in maximum likelihood and Bayesian phylogenetic inference. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood vectors for every node of the tree. Vector entries for which all input data are identical result in repeated likelihood operations which, in turn, yield identical conditional values. Such operations can be omitted for improving run-time and, by using appropriate data structures, also reduce memory usage. Here, we present a novel, fast method for identifying and omitting such redundant operations in phylogenetic likelihood calculations, and assess the performance improvement that is attained by our method. Using empirical and real data sets, we show that a prototype implementation of our method yields up to 10-fold speedups and uses up to 78% less memory than one of the fastest and most highly tuned implementations of the phylogenetic likelihood function available. Our method is generic and can seamlessly be integrated into any phylogenetic likelihood implementation.