An improved sequence measure used to scan genomes for regions of recent gene flow
Anthony J. Geneva, Christina A. Muirhead, LeAnne M. Lovato, Sarah B. Kingan, Daniel Garrigan
(Submitted on 6 Mar 2014)
The study of complex speciation, or speciation with gene flow, requires the identification of genomic regions that are either unusually divergent or that have experienced recent gene flow. Furthermore, the rapid growth of population genomic datasets relevant to studying complex speciation requires that analytical tools be scalable to the level of whole-genome analysis. We present a simple sequence measure, Gmin which is specifically designed to identify regions of diverging genomes as candidates for experiencing recent gene flow. Gmin is defined as the ratio of the minimum number of nucleotide differences between sequences from two different populations to the average number of between-population differences. We compare the sensitivity of Gmin to that of the widely used index of population differentiation, Fst. Extensive computer simulations demonstrate that Gmin has greater sensitivity and specificity to detect gene flow than Fst. Additionally, the sensitivity of Gmin to detect gene flow is robust with respect to both the population mutation and recombination rates, suggesting that it is flexible and can be applied to a variety of biological scenarios. Finally, a scan of Gmin across the X~chromosome of Drosophila melanogaster identifies candidate regions of introgression between sub-Saharan African and cosmopolitan populations that were previously missed by other methods. These results demonstrate that Gmin is a biologically straightforward, yet powerful, alternative to Fst, as well as to more computationally intensive model-based methods for detecting gene flow.
A renewal theory approach to IBD sharing
Shai Carmi, Itsik Pe’er
(Submitted on 6 Mar 2014)
Long genomic segments that are nearly identical between a pair of individuals and are inherited from a recent common ancestor without recombination are called identical-by-descent (IBD) segments. IBD sharing has numerous applications in genetics, from demographic inference to phasing, imputation, pedigree reconstruction, and disease mapping. Here, we provide a theoretical analysis of IBD sharing under Markovian approximations of the coalescent with recombination. We describe a general framework for the IBD process along the chromosome under the Markovian models (SMC/SMC’), as well as introduce and justify a new model, which we term the renewal approximation, under which lengths of successive segments are independent. Then, considering the infinite-chromosome limit of the IBD process, we recover previous results (for SMC) and derive new results (for SMC’) for the average fraction of the chromosome found in long shared segments and the average number of such segments. A number of new results for tree heights in SMC’ are proved as lemmas. We then use renewal theory to derive an expression (in Laplace space) for the distribution of the number of shared segments and demonstrate implications for demographic inference. We also use renewal theory to compute the distribution of the fraction of the chromosome shared. While the expression is again in Laplace space, we could invert the first two moments and compare a number of approximations. Finally, we generalized all results to populations with variable historical effective size.
Phylogenetic tree shapes resolve disease transmission patterns
Jennifer Gardy, Caroline Colijn
Whole genome sequencing is becoming popular as a tool for understanding outbreaks of communicable diseases, with phylogenetic trees being used to identify individual transmission events or to characterize outbreak-level overall transmission dynamics. Existing methods to infer transmission dynamics from sequence data rely on well-characterised infectious periods, epidemiological and clinical meta-data which may not always be available, and typically require computationally intensive analysis focussing on the branch lengths in phylogenetic trees. We sought to determine whether the topological structures of phylogenetic trees contain signatures of the overall transmission patterns underlying an outbreak. Here we use simulated outbreaks to train and then test computational classifiers. We test the method on data from two real-world outbreaks. We find that different transmission patterns result in quantitatively different phylogenetic tree shapes. We describe five topological features that summarize a phylogeny’s structure and find that computational classifiers based on these are capable of predicting an outbreak’s transmission dynamics. The method is robust to variations in the transmission parameters and network types, and recapitulates known epidemiology of previously characterized real-world outbreaks. We conclude that there are simple structural properties of phylogenetic trees which, when combined, can distinguish communicable disease outbreaks with a super-spreader, homogeneous transmission, and chains of transmission. This is possible using genome data alone, and can be done during an outbreak. We discuss the implications for management of outbreaks.
Reassortment between influenza B lineages and the emergence of a co-adapted PB1-PB2-HA gene complex
Gytis Dudas, Trevor Bedford, Samantha Lycett, Andrew Rambaut
Comments: 33 pages, 21 figures
Subjects: Populations and Evolution (q-bio.PE)
Influenza B viruses are increasingly being recognized as major contributors to morbidity attributed to seasonal influenza. Currently circulating influenza B isolates are known to belong to two antigenically distinct lineages referred to as B/Victoria and B/Yamagata. Frequent exchange of genomic segments of these two lineages has been noted in the past, but the observed patterns of reassortment have not been formalized in detail. We investigate inter-lineage reassortments by comparing phylogenetic trees across genomic segments. Our analyses indicate that of the 8 segments of influenza B viruses only PB1, PB2 and HA segments maintained separate Victoria and Yamagata lineages and that currently circulating strains possess PB1, PB2 and HA segments derived entirely from one or the other lineage; other segments have repeatedly reassorted between lineages thereby reducing genetic diversity. We argue that this difference between segments is due to selection against reassortant viruses with mixed lineage PB1, PB2 and HA segments. Given sufficient time and continued recruitment to the reassortment-isolated PB1-PB2-HA gene complex, we expect influenza B viruses to eventually undergo sympatric speciation.
The limits of selection under plant domestication
Robin G. Allaby, Dorian Q. Fuller, James L. Kitchen
Subjects: Populations and Evolution (q-bio.PE)
Plant domestication involved a process of selection through human agency of a series of traits collectively termed the domestication syndrome. Current debate concerns the pace at which domesticated plants emerged from cultivated wild populations and how many genes were involved. Here we present simulations that test how many genes could have been involved by considering the cost of selection. We demonstrate the selection load that can be endured by populations increases with decreasing selection coefficients and greater numbers of loci down to values of about s = 0.005, causing a driving force that increases the number of loci under selection. As the number of loci under selection increases, an effect of co-selection increases resulting in individual unlinked loci being fixed more rapidly in out-crossing populations, representing a second driving force to increase the number of loci under selection. In inbreeding systems co-selection results in interference and reduced rates of fixation but does not reduce the size of the selection load that can be endured. These driving forces result in an optimum pace of genome evolution in which 50-100 loci are the most that could be under selection in a cultivation regime. Furthermore, the simulations do not preclude the existence of selective sweeps but demonstrate that they come at a cost of the selection load that can be endured and consequently a reduction of the capacity of plants to adapt to new environments, which may contribute to the explanation of why selective sweeps have been so rarely detected in genome studies.
Mycobiome of the Bat White Nose Syndrome (WNS) Affected Caves and Mines reveals High Diversity of Fungi and Local Adaptation by the Fungal Pathogen Pseudogymnoascus (Geomyces) destructans
Tao Zhang, Tanya R. Victor, Sunanda S. Rajkumar, Xiaojiang Li, Joseph C. Okoniewski, Alan C. Hicks, April D. Davis, Kelly Broussard, Shannon L. LaDeau, Sudha Chaturvedi, Vishnu Chaturvedi
(Submitted on 3 Mar 2014)
The investigations of the bat White Nose Syndrome (WNS) have yet to provide answers as to how the causative fungus Pseudogymnoascus (Geomyces) destructans (Pd) first appeared in the Northeast and how a single clone has spread rapidly in the US and Canada. We aimed to catalogue Pd and all other fungi (mycobiome) by the culture-dependent (CD) and culture-independent (CI) methods in four Mines and two Caves from the epicenter of WNS zoonotic. Six hundred sixty-five fungal isolates were obtained by CD method including the live recovery of Pd. Seven hundred three nucleotide sequences that met the definition of operational taxonomic units (OTUs) were recovered by CI methods. Most OTUs belonged to unidentified clones deposited in the databases as environmental nucleic acid sequences (ENAS). The core mycobiome of WNS affected sites comprised of 46 species of fungi from 31 genera recovered in culture, and 17 fungal genera and 31 ENAS identified from clone libraries. Fungi such as Arthroderma spp., Geomyces spp., Kernia spp., Mortierella spp., Penicillium spp., and Verticillium spp. were predominant in culture while Ganoderma spp., Geomyces spp., Mortierella spp., Penicillium spp. and Trichosporon spp. were abundant is clone libraries. Alpha diversity analyses from CI data revealed that fungal community structure was highly diverse. However, the true species diversity remains undetermined due to under sampling. The frequent recovery of Pd indicated that the pathogen has adapted to WNS-afflicted habitats. Further, this study supports the hypothesis that Pd is an introduced species. These findings underscore the need for integrated WNS control measures that target both bats and the fungal pathogen.
Decoding coalescent hidden Markov models in linear time
Kelley Harris, Sara Sheehan, John A. Kamm, Yun S. Song
(Submitted on 4 Mar 2014)
In many areas of computational biology, hidden Markov models (HMMs) have been used to model local genomic features. In particular, coalescent HMMs have been used to infer ancient population sizes, migration rates, divergence times, and other parameters such as mutation and recombination rates. As more loci, sequences, and hidden states are added to the model, however, the runtime of coalescent HMMs can quickly become prohibitive. Here we present a new algorithm for reducing the runtime of coalescent HMMs from quadratic in the number of hidden time states to linear, without making any additional approximations. Our algorithm can be incorporated into various coalescent HMMs, including the popular method PSMC for inferring variable effective population sizes. Here we implement this algorithm to speed up our demographic inference method diCal, which is equivalent to PSMC when applied to a sample of two haplotypes. We demonstrate that the linear-time method can reconstruct a population size change history more accurately than the quadratic-time method, given similar computation resources. We also apply the method to data from the 1000 Genomes project, inferring a high-resolution history of size changes in the European population.