Phylogenetic tree shapes resolve disease transmission patterns
Jennifer Gardy, Caroline Colijn
Whole genome sequencing is becoming popular as a tool for understanding outbreaks of communicable diseases, with phylogenetic trees being used to identify individual transmission events or to characterize outbreak-level overall transmission dynamics. Existing methods to infer transmission dynamics from sequence data rely on well-characterised infectious periods, epidemiological and clinical meta-data which may not always be available, and typically require computationally intensive analysis focussing on the branch lengths in phylogenetic trees. We sought to determine whether the topological structures of phylogenetic trees contain signatures of the overall transmission patterns underlying an outbreak. Here we use simulated outbreaks to train and then test computational classifiers. We test the method on data from two real-world outbreaks. We find that different transmission patterns result in quantitatively different phylogenetic tree shapes. We describe five topological features that summarize a phylogeny’s structure and find that computational classifiers based on these are capable of predicting an outbreak’s transmission dynamics. The method is robust to variations in the transmission parameters and network types, and recapitulates known epidemiology of previously characterized real-world outbreaks. We conclude that there are simple structural properties of phylogenetic trees which, when combined, can distinguish communicable disease outbreaks with a super-spreader, homogeneous transmission, and chains of transmission. This is possible using genome data alone, and can be done during an outbreak. We discuss the implications for management of outbreaks.