Alexandra Gavryushkina, David Welch, Tanja Stadler, Alexei Drummond
(Submitted on 18 Jun 2014)
Phylogenetic analyses which include fossils or molecular sequences that are sampled through time require models that allow one sample to be a direct ancestor of another sample. As previously available phylogenetic inference tools assume that all samples are tips, they do not allow for this possibility. We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to infer what we call sampled ancestor trees, that is, trees in which sampled individuals can be direct ancestors of other sampled individuals. We use a family of birth-death models where individuals may remain in the tree process after the sampling, in particular we extend the birth-death skyline model [Stadler et al, 2013] to sampled ancestor trees. This method allows the detection of sampled ancestors as well as estimation of the probability that an individual will be removed from the process when it is sampled. We show that sampled ancestor birth-death models where all samples come from different time points are non-identifiable and thus require one parameter to be known in order to infer other parameters. We apply this method to epidemiological data, where the possibility of sampled ancestors enables us to identify individuals that infected other individuals after being sampled and to infer fundamental epidemiological parameters. We also apply the method to infer divergence times and diversification rates when fossils are included among the species samples, so that fossilisation events are modelled as a part of the tree branching process. Such modelling has many advantages as argued in literature. The sampler is available as an open-source BEAST2 package (this https URL ancestors/).