Shared genomic variants: identification of transmission routes using pathogen deep sequence data
While identifying routes of transmission during an infectious disease outbreak was traditionally conducted through exhaustive contact tracing efforts, the increasing availability of pathogen sequencing has provided a new resource with which one can identify plausible routes of infection. However, while transmission clusters can be identified using single genome sequences, individual transmission routes remain relatively uncertain. Deep sequence data may provide additional information where single genomes lack sufficient resolution – presence of shared minor variants can suggest epidemiological linkage when observed between multiple hosts. In this study we formalize shared variant methods to reconstruct the transmission tree in an outbreak, and using simulated outbreak data, we quantify the improved accuracy when compared with analogous single genome approaches. Furthermore we propose a hybrid approach, drawing information from both deep sequence and single genome data. Our simulation studies demonstrate the superior performance of transmission tree identification methods using shared variants in most settings. Application of these methods to deep sequence data collected during the 2014 Sierra Leone Ebola epidemic demonstrates the ability to identify plausible transmission routes without any additional data. The methods we describe should become a common step in outbreak investigations and epidemiological analyses once the collection of deep sequence data becomes increasingly widespread.