Identifying Keystone Species in the Human Gut Microbiome from Metagenomic Timeseries using Sparse Linear Regression

Identifying Keystone Species in the Human Gut Microbiome from Metagenomic Timeseries using Sparse Linear Regression

Charles K. Fisher, Pankaj Mehta
(Submitted on 3 Feb 2014)

Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the interactions between species from sequence data. Any algorithm for inferring species interactions must overcome three obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions. Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct “keystone species”, Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human gut microbiome.

A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data

A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data
David Coil, Guillaume Jospin, Aaron E. Darling
(Submitted on 21 Jan 2014)

Motivation: Open-source bacterial genome assembly remains inaccessible to many biologists due to its complexity. Few software solutions exist that are capable of automating all steps in the process of de novo genome assembly from Illumina data.
Results: A5-miseq can produce high quality and microbial genome assemblies on a laptop computer without any parameter tuning. A5-miseq does this by automating the process of adapter trimming, quality filtering, error correction, contig and scaffold generation, and detection of misassemblies. Unlike the original A5 pipeline, A5-miseq can use long reads from the Illumina MiSeq, use read pairing information during contig generation, and includes several improvements to read trimming. Together these changes result in substantially improved assemblies that recover a more complete set of reference genes than previous methods.
Availability: A5-miseq is licensed under the GPL open source license. Source code and precompiled binaries for Mac OS X 10.6+ and Linux 2.6.15+ are available from this http URL

Reconstructing transmission networks for communicable diseases using densely sampled genomic data: a generalized approach

Reconstructing transmission networks for communicable diseases using densely sampled genomic data: a generalized approach
Colin J. Worby, Philip D. O’Neill, Theodore Kypraios, Julie V. Robotham, Daniela De Angelis, Edward J. P. Cartwright, Sharon J. Peacock, Ben S. Cooper
(Submitted on 8 Jan 2014)

Probabilistic reconstruction of transmission networks for communicable diseases can provide important insights into epidemic dynamics, the effectiveness of infection control measures, and contact patterns in an at-risk population. Whole genome sequencing of pathogens from multiple hosts provides an opportunity to investigate who infected whom with unparalleled resolution. We considered disease outbreaks in a community with high frequency genomic sampling, and formulated stochastic epidemic models to investigate person-to-person transmission, based on genomic and epidemiological data. Our approach, which combines a stochastic epidemic transmission model with a genetic distance model, overcomes key limitations of previous methods by providing a framework with the flexibility to allow for unobserved infection times, multiple independent introductions of the pathogen, and within-host genetic diversity, as well as allowing forward simulation. We defined two genetic models: a transmission diversity model, in which genetic diversity increases along a transmission chain, and an importation structure model, which groups isolates into genetically similar clusters. We evaluated their predictive performance using simulated data, demonstrating high sensitivity and specificity, particularly for rapidly mutating pathogens with low transmissibility. We then analyzed data collected during an outbreak of MRSA in a hospital. We identified three probable transmission events (posterior probability > 0.5) among the twenty observed cases. We estimated that genetic diversity across transmission links was approximately the same as within-host, with an expected 3.9 (95% CrI: 3.3, 4.6) single nucleotide polymorphisms between isolates. Our methodology avoids restrictive assumptions required in many analyses, and has broad applicability to epidemics with densely sampled genomic data.

Bayesian inference of infectious disease transmission from whole genome sequence data

Bayesian inference of infectious disease transmission from whole genome sequence data
Xavier Didelot, Jennifer Gardy, Caroline Colijn

Genomics is increasingly being used to investigate disease outbreaks, but an important question remains unanswered — how well do genomic data capture known transmission events, particularly for pathogens with long carriage periods or large within-host population sizes? Here we present a novel Bayesian approach to reconstruct densely-sampled outbreaks from genomic data whilst considering within-host diversity. We infer a time-labelled phylogeny using BEAST, then infer a transmission network via a Monte-Carlo Markov Chain. We find that under a realistic model of within-host evolution, reconstructions of simulated outbreaks contain substantial uncertainty even when genomic data reflect a high substitution rate. Reconstruction of a real-world tuberculosis outbreak displayed similar uncertainty, although the correct source case and several clusters of epidemiologically linked cases were identified. We conclude that genomics cannot wholly replace traditional epidemiology, but that Bayesian reconstructions derived from sequence data may form a useful starting point for a genomic epidemiology investigation.

Extensive Phenotypic Changes Associated with Large-scale Horizontal Gene Transfer

Extensive Phenotypic Changes Associated with Large-scale Horizontal Gene Transfer
Kevin Dougherty, Brian A Smith, Autum F Moore, Shannon Maitland, Chris Fanger, Rachel Murillo, David A Baltrus

Horizontal gene transfer often leads to phenotypic changes within recipient organisms independent of any immediate evolutionary benefits. While secondary phenotypic effects of horizontal transfer (i.e. changes in growth rates) have been demonstrated and studied across a variety of systems using relatively small plasmid and phage, little is known about how size of the acquired region affects the magnitude or number of such costs. Here we describe an amazing breadth of phenotypic changes which occur after a large-scale horizontal transfer event (~1Mb megaplasmid) within Pseudomonas stutzeri including sensitization to various stresses as well as changes in bacterial behavior. These results highlight the power of horizontal transfer to shift pleiotropic relationships and cellular networks within bacterial genomes. They also provide an important context for how secondary effects of transfer can bias evolutionary trajectories and interactions between species. Lastly, these results and system provide a foundation to investigate evolutionary consequences in real time as newly acquired regions are ameliorated and integrated into new genomic contexts.