Phylogenetic tree shapes resolve disease transmission patterns

Phylogenetic tree shapes resolve disease transmission patterns
Jennifer Gardy, Caroline Colijn

Whole genome sequencing is becoming popular as a tool for understanding outbreaks of communicable diseases, with phylogenetic trees being used to identify individual transmission events or to characterize outbreak-level overall transmission dynamics. Existing methods to infer transmission dynamics from sequence data rely on well-characterised infectious periods, epidemiological and clinical meta-data which may not always be available, and typically require computationally intensive analysis focussing on the branch lengths in phylogenetic trees. We sought to determine whether the topological structures of phylogenetic trees contain signatures of the overall transmission patterns underlying an outbreak. Here we use simulated outbreaks to train and then test computational classifiers. We test the method on data from two real-world outbreaks. We find that different transmission patterns result in quantitatively different phylogenetic tree shapes. We describe five topological features that summarize a phylogeny’s structure and find that computational classifiers based on these are capable of predicting an outbreak’s transmission dynamics. The method is robust to variations in the transmission parameters and network types, and recapitulates known epidemiology of previously characterized real-world outbreaks. We conclude that there are simple structural properties of phylogenetic trees which, when combined, can distinguish communicable disease outbreaks with a super-spreader, homogeneous transmission, and chains of transmission. This is possible using genome data alone, and can be done during an outbreak. We discuss the implications for management of outbreaks.

Advertisement

Reassortment between influenza B lineages and the emergence of a co-adapted PB1-PB2-HA gene complex

Reassortment between influenza B lineages and the emergence of a co-adapted PB1-PB2-HA gene complex
Gytis Dudas, Trevor Bedford, Samantha Lycett, Andrew Rambaut
Comments: 33 pages, 21 figures
Subjects: Populations and Evolution (q-bio.PE)

Influenza B viruses are increasingly being recognized as major contributors to morbidity attributed to seasonal influenza. Currently circulating influenza B isolates are known to belong to two antigenically distinct lineages referred to as B/Victoria and B/Yamagata. Frequent exchange of genomic segments of these two lineages has been noted in the past, but the observed patterns of reassortment have not been formalized in detail. We investigate inter-lineage reassortments by comparing phylogenetic trees across genomic segments. Our analyses indicate that of the 8 segments of influenza B viruses only PB1, PB2 and HA segments maintained separate Victoria and Yamagata lineages and that currently circulating strains possess PB1, PB2 and HA segments derived entirely from one or the other lineage; other segments have repeatedly reassorted between lineages thereby reducing genetic diversity. We argue that this difference between segments is due to selection against reassortant viruses with mixed lineage PB1, PB2 and HA segments. Given sufficient time and continued recruitment to the reassortment-isolated PB1-PB2-HA gene complex, we expect influenza B viruses to eventually undergo sympatric speciation.

The limits of selection under plant domestication

The limits of selection under plant domestication
Robin G. Allaby, Dorian Q. Fuller, James L. Kitchen
Subjects: Populations and Evolution (q-bio.PE)

Plant domestication involved a process of selection through human agency of a series of traits collectively termed the domestication syndrome. Current debate concerns the pace at which domesticated plants emerged from cultivated wild populations and how many genes were involved. Here we present simulations that test how many genes could have been involved by considering the cost of selection. We demonstrate the selection load that can be endured by populations increases with decreasing selection coefficients and greater numbers of loci down to values of about s = 0.005, causing a driving force that increases the number of loci under selection. As the number of loci under selection increases, an effect of co-selection increases resulting in individual unlinked loci being fixed more rapidly in out-crossing populations, representing a second driving force to increase the number of loci under selection. In inbreeding systems co-selection results in interference and reduced rates of fixation but does not reduce the size of the selection load that can be endured. These driving forces result in an optimum pace of genome evolution in which 50-100 loci are the most that could be under selection in a cultivation regime. Furthermore, the simulations do not preclude the existence of selective sweeps but demonstrate that they come at a cost of the selection load that can be endured and consequently a reduction of the capacity of plants to adapt to new environments, which may contribute to the explanation of why selective sweeps have been so rarely detected in genome studies.

Mycobiome of the Bat White Nose Syndrome (WNS) Affected Caves and Mines reveals High Diversity of Fungi and Local Adaptation by the Fungal Pathogen Pseudogymnoascus (Geomyces) destructans

Mycobiome of the Bat White Nose Syndrome (WNS) Affected Caves and Mines reveals High Diversity of Fungi and Local Adaptation by the Fungal Pathogen Pseudogymnoascus (Geomyces) destructans

Tao Zhang, Tanya R. Victor, Sunanda S. Rajkumar, Xiaojiang Li, Joseph C. Okoniewski, Alan C. Hicks, April D. Davis, Kelly Broussard, Shannon L. LaDeau, Sudha Chaturvedi, Vishnu Chaturvedi
(Submitted on 3 Mar 2014)

The investigations of the bat White Nose Syndrome (WNS) have yet to provide answers as to how the causative fungus Pseudogymnoascus (Geomyces) destructans (Pd) first appeared in the Northeast and how a single clone has spread rapidly in the US and Canada. We aimed to catalogue Pd and all other fungi (mycobiome) by the culture-dependent (CD) and culture-independent (CI) methods in four Mines and two Caves from the epicenter of WNS zoonotic. Six hundred sixty-five fungal isolates were obtained by CD method including the live recovery of Pd. Seven hundred three nucleotide sequences that met the definition of operational taxonomic units (OTUs) were recovered by CI methods. Most OTUs belonged to unidentified clones deposited in the databases as environmental nucleic acid sequences (ENAS). The core mycobiome of WNS affected sites comprised of 46 species of fungi from 31 genera recovered in culture, and 17 fungal genera and 31 ENAS identified from clone libraries. Fungi such as Arthroderma spp., Geomyces spp., Kernia spp., Mortierella spp., Penicillium spp., and Verticillium spp. were predominant in culture while Ganoderma spp., Geomyces spp., Mortierella spp., Penicillium spp. and Trichosporon spp. were abundant is clone libraries. Alpha diversity analyses from CI data revealed that fungal community structure was highly diverse. However, the true species diversity remains undetermined due to under sampling. The frequent recovery of Pd indicated that the pathogen has adapted to WNS-afflicted habitats. Further, this study supports the hypothesis that Pd is an introduced species. These findings underscore the need for integrated WNS control measures that target both bats and the fungal pathogen.

Decoding coalescent hidden Markov models in linear time

Decoding coalescent hidden Markov models in linear time

Kelley Harris, Sara Sheehan, John A. Kamm, Yun S. Song
(Submitted on 4 Mar 2014)

In many areas of computational biology, hidden Markov models (HMMs) have been used to model local genomic features. In particular, coalescent HMMs have been used to infer ancient population sizes, migration rates, divergence times, and other parameters such as mutation and recombination rates. As more loci, sequences, and hidden states are added to the model, however, the runtime of coalescent HMMs can quickly become prohibitive. Here we present a new algorithm for reducing the runtime of coalescent HMMs from quadratic in the number of hidden time states to linear, without making any additional approximations. Our algorithm can be incorporated into various coalescent HMMs, including the popular method PSMC for inferring variable effective population sizes. Here we implement this algorithm to speed up our demographic inference method diCal, which is equivalent to PSMC when applied to a sample of two haplotypes. We demonstrate that the linear-time method can reconstruct a population size change history more accurately than the quadratic-time method, given similar computation resources. We also apply the method to data from the 1000 Genomes project, inferring a high-resolution history of size changes in the European population.

Conditions for the validity of SNP-based heritability estimation

Conditions for the validity of SNP-based heritability estimation
James J Lee, Carson C Chow

The heritability of a trait ($h^2$) is the proportion of its population variance caused by genetic differences, and estimates of this parameter are important for interpreting the results of genome-wide association studies (GWAS). In recent years, researchers have adopted a novel method for estimating a lower bound on heritability directly from GWAS data that uses realized genetic similarities between nominally unrelated individuals. The quantity estimated by this method is purported to be the contribution to heritability that could in principle be recovered from association studies employing the given panel of SNPs ($h^2_\textrm{SNP}$). Thus far the validity of this approach has mostly been tested empirically. Here, we provide a mathematical explication and show that the method should remain a robust means of obtaining $h^2_\textrm{SNP}$ under circumstances wider than those under which it has so far been derived.

Most viewed on Haldane’s Sieve: February 2014

The most viewed posts on Haldane’s Sieve last month were: