Worldwide population structure, long term demography, and local adaptation of Helicobacter pylori
Valeria Montano, Xavier Didelot, Matthieu Foll, Bodo Linz, Richard Reinhardt, Sebastian Suerbaum, Yoshan Moodley, Jeffrey David Jensen
Helicobacter pylori is an important human pathogen associated with serious gastric diseases. Owing to its medical importance and close relationship with its human host, understanding genomic patterns of global and local adaptation in H. pylori may be of particular significance for both clinical and evolutionary studies. Here we present the first such whole-genome analysis of 60 globally distributed strains, from which we inferred worldwide population structure and demographic history and shed light on interesting global and local events of positive selection, with particular emphasis on the evolution of San-associated lineages. Our results indicate a more ancient origin for the association of humans and H. pylori than previously thought. We identify several important perspectives for future clinical research on candidate selected regions that include both previously characterized genes (e.g. transcription elongation factor NusA and tumor Necrosis Factor Alpha-Inducing Protein Tipα) and hitherto unknown functional genes.
Character trees from transcriptome data: origin and individuation of morphological characters and the so-called “species signal”
Jacob Musser, Gunter Wagner
We elaborate a framework for investigating the evolutionary history of morphological characters. We argue that morphological character trees generated from transcriptomes provide a useful tool for identifying causal gene expression differences underlying the development and evolution of morphological characters. They also enable rigorous testing of different models of morphological character evolution and origination, including the hypothesis that characters originate via divergence of repeated ancestral characters. Finally, morphological character trees provide evidence that character transcriptomes undergo concerted evolution. We argue that concerted evolution of transcriptomes can explain the so-called “species-specific clustering” found in several recent comparative transcriptome studies. The species signal is the phenomenon that transcriptomes cluster by species rather than character type, even though the characters are older than the respective species. We suggest that concerted gene expression evolution results from mutations that alter gene regulatory network interactions shared by the characters under comparison. Thus, character trees generated from transcriptomes allow us to investigate the variational independence, or individuation, of morphological characters at the level of genetic programs.
Real-time strain typing and analysis of antibiotic resistance potential using Nanopore MinION sequencing
Minh Duc Cao, Devika Ganesamoorthy, Alysha Elliott, Huihui Zhang, Matthew Cooper, Lachlan Coin
Clinical pathogen sequencing has significant potential to drive informed treatment of patients with unknown bacterial infection. However, the lack of rapid sequencing technologies with concomitant analysis has impeded clinical adoption in infection diagnosis. Here we demonstrate that commercially-available Nanopore sequencing devices can identify bacterial species and strain information with less than one hour of sequencing time, initial drug-resistance profiles within 2 hours, and a complete resistance profile within 12 hours. We anticipate these devices and associated analysis methods may become useful clinical tools to guide appropriate therapy in time-critical clinical presentations such as bacteraemia and sepsis.
The Multi-allelic Genetic Architecture of a Variance-heterogeneity Locus for Molybdenum Accumulation Acts as a Source of Unexplained Additive Genetic Variance
Simon K G Forsberg, Matthew E Andreatta, Xin-Yuan Huang, John Danku, David E Salt, Örjan Carlborg
Most biological traits are regulated by both genetic and environmental factors. Individual loci contributing to the phenotypic diversity in a population are generally identified by their contributions to the trait mean. Genome-wide association (GWA) analyses can also detect loci based on variance differences between genotypes and several hypotheses have been proposed regarding the possible genetic mechanisms leading to such signals. Little is, however, known about what causes them and whether this genetic variance-heterogeneity reflects mechanisms of importance in natural populations. Previously, we identified a variance-heterogeneity GWA (vGWA) signal for leaf molybdenum concentrations in Arabidopsis thaliana. Here, fine-mapping of this association to a ~78 kb Linkage Disequilibrium (LD)-block reveals that it emerges from the independent effects of three genetic polymorphisms on the high-variance associated version of this LD-block. By revealing the genetic architecture underlying this vGWA signal, we uncovered the molecular source of a significant amount of hidden additive genetic variation (“missing heritability”). Two of the three polymorphisms on the high-variance LD-block are promoter variants for Molybdate transporter 1 (MOT1), and the third a variant located ~25 kb downstream of this gene. A fourth independent association was also detected ~600 kb upstream of the LD-block. Testing of T-DNA knockout alleles for genes in the associated regions suggest AT2G25660 (unknown function) and AT2G26975 (Copper Transporter 6; COPT6) as the strongest candidates for the associations outside MOT1. Our results show that multi-allelic genetic architectures within a single LD-block can lead to a variance-heterogeneity between genotypes in natural populations. Further they provide novel insights into the genetic regulation of ion homeostasis in A. thaliana, and empirically confirm that variance-heterogeneity based GWA methods are a valuable tool to detect novel associations of biological importance in natural populations.
ReproPhylo: An Environment for Reproducible Phylogenomics
Amir Szitenberg, Max John, Mark L Blaxter, David H Lunt
The reproducibility of experiments is key to the scientific process, and particularly necessary for accurate reporting of analyses in data-rich fields such as phylogenomics. We present ReproPhylo, a phylogenomic analysis environment developed to ensure experimental reproducibility, to facilitate the handling of large-scale data, and to assist methodological experimentation. Reproducibility, and instantaneous repeatability, is built in to the ReproPhylo system, and does not require user intervention or configuration because it stores the experimental workflow as a single, serialized Python object containing explicit provenance and environment information. This ‘single file’ approach ensures the persistence of provenance across iterations of the analysis, with changes automatically managed by the version control program Git. ReproPhylo produces an extensive human-readable report, and generates a comprehensive experimental archive file, both of which are suitable for submission with publications. The system facilitates thorough experimental exploration of both parameters and data. ReproPhylo is a platform independent CC0 python module, and is easily installed as a Docker image, with an Jupyter GUI, or as a slimmer version in a Galaxy distribution.
FIQT: a simple, powerful method to accurately estimate effect sizes in genome scans
Tim B Bigdeli, Donghyung Lee, Brien P Riley, Vladimir I Vladimirov, Ayman H Fanous, Kenneth S Kendler, Silviu-Alin Bacanu
Genome scans, including both genome-wide association studies and deep sequencing, continue to discover a growing number of significant association signals for various traits. However, often variants meeting genome-wide significance criteria explain far less of the overall trait variance than “sub-threshold” association signals. To extract these sub-threshold signals, there is a need for methods which accurately estimate the mean of all (normally-distributed) test-statistics from a genome scan (i.e., Z-scores). This is currently achieved by the difficult procedures of adjusting all Z-score (χ_1^2) statistics for “winner’s curse” (multiple testing). Given that multiple testing adjustments are much simpler for p-values, we propose a method for estimating Z-scores means by i) first adjusting their p-values for multiple testing and then ii) transforming the adjusted p-values to upper tail Z-scores with the sign of the original statistics. Because a False Discovery Rate (FDR) procedure is used for multiple testing adjustment, we denote this method FDR Inverse Quantile Transformation (FIQT). When compared to competitors, e.g. Empirical Bayes (including proposed improvements), FIQT is more i) accurate and ii) computationally efficient by orders of magnitude. Its accuracy advantage is substantial at larger sample sizes and/or moderate numbers of association signals. Practical application of FIQT to Z-scores from the first Psychiatric Genetic Consortium (PGC) schizophrenia predicts a non-trivial fraction of the significant signal regions from the subsequent published PGC schizophrenia studies. Finally, we suggest that FIQT might be i) used to improve subject level risk prediction and ii) further improved by modelling the noncentrality of χ_1^2 statistics.
Roary: Rapid large-scale prokaryote pan genome analysis
Andrew J Page, Carla A Cummins, Martin Hunt, Vanessa K Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Jacqueline A Keane, Julian Parkhill
A typical prokaryote population sequencing study can now consist of hundreds or thousands of isolates. Interrogating these datasets can provide detailed insights into the genetic structure of of prokaryotic genomes. We introduce Roary, a tool that rapidly builds large-scale pan genomes, identifying the core and dispensable accessory genes. Roary makes construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results. Using a single CPU Roary can produce a pan genome consisting of 1000 isolates in 4.5 hours using 13 GB of RAM, with further speedups possible using multiple processors.