Computing the posterior expectation of phylogenetic trees

Computing the posterior expectation of phylogenetic trees
Philipp Benner, Miroslav Bačák
(Submitted on 16 May 2013)

Inferring phylogenetic trees from multiple sequence alignments often relies upon Markov chain Monte Carlo (MCMC) methods to generate tree samples from a posterior distribution. To give a rigorous approximation of the posterior expectation, one needs to compute the mean of the tree samples and therefore a sound definition of a mean and algorithms for its computation are highly demanded. To the best of our knowledge, no existing method of phylogenetic inference can handle the full set of sample trees, because such trees typically have different topologies. We develop a novel statistical model for the inference of phylogenetic trees based on the tree space due to Billera et al. [2001]. Since it is an Hadamard space, the mean and median are well defined, which we also motivate from a decision theoretic perspective. The actual approximation of the posterior expectation relies on some recent developments in Hadamard spaces (Ba\v{c}\’ak [2013a], Miller et al. [2012]) and the fast computation of geodesics in tree space (Owen and Provan [2011]), which altogether enable to compute medians and means of trees with different topologies. Our intention is to give a full self-contained description of the methods required to approximate posterior expectations. We demonstrate these methods on the small ribosomal subunit rRNA sequence alignment. The posterior expectations obtained on this data set are a meaningful summary of the posterior distribution and the uncertainty about the tree topology.

Leave a comment