Maximum likelihood estimates of pairwise rearrangement distances

Stuart Serdoz, Attila Egri-Nagy, Jeremy Sumner, Barbara R. Holland, Peter Jarvis, Mark M. Tanaka, Andrew R. Francis

Accurate estimation of evolutionary distances between taxa is important for many phylogenetic reconstruction methods. Specifically, in the case of bacteria, distances can be estimated using a range of different evolutionary models, from single nucleotide polymorphisms to large-scale genome rearrangements. Most such methods use the minimal distance as a proxy for true distance, and only occasionally are improvements such as a Jukes-Cantor correction (for SNP models) available to improve this underestimate. In particular, for genome rearrangement models such as inversion, there is currently no way to correct for such underestimates. Here we introduce a maximum likelihood estimator for the inversion distance between a pair of genomes, using the group-theoretic approach to modelling inversions introduced recently. This MLE functions as a corrected distance in its ability to correct for multiple changes. In particular, we show that because of the way sequences of inversions interact with each other, it is quite possible for minimal distance and MLE distance to differently order the distances of two genomes from a third. This has an obvious implication for the use of minimal distance in phylogeny reconstruction.

Nice algebra, but paper misses huge body of previous literature on ML distances using MCMC.

Thanks for the comment – we’ll be adding more context to the paper before we submit it for publication (hopefully very soon!).