On the Origins and Control of Community Types in the Human Microbiome

On the Origins and Control of Community Types in the Human Microbiome

Travis E. Gibson, Amir Bashan, Hong-Tai Cao, Scott T. Weiss, Yang-Yu Liu
(Submitted on 17 Jun 2015)

Microbiome-based stratification of healthy individuals into compositional categories, referred to as “community types”, holds promise for drastically improving personalized medicine. Despite this potential, the existence of community types and the degree of their distinctness have been highly debated. Here we adopted a dynamic systems approach and found that heterogeneity in the interspecific interactions or the presence of strongly interacting species is sufficient to explain community types, independent of the topology of the underlying ecological network. By controlling the presence or absence of these strongly interacting species we can steer the microbial ecosystem to any desired community type. This open-loop control strategy still holds even when the community types are not distinct but appear as dense regions within a continuous gradient. This finding can be used to develop viable therapeutic strategies for shifting the microbial composition to a healthy configuration

Large-scale Machine Learning for Metagenomics Sequence Classification

Large-scale Machine Learning for Metagenomics Sequence Classification

Kévin Vervier (CBIO), Pierre Mahé, Maud Tournoud, Jean-Baptiste Veyrieras, Jean-Philippe Vert (CBIO)
(Submitted on 26 May 2015)

Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Due to the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasonable computing requirements. While standard alignment-based methods provide state-of-the-art performance, compositional approaches that assign a taxonomic class to a DNA read based on the k-mers it contains have the potential to provide faster solutions. In this work, we investigate the potential of modern, large-scale machine learning implementations for taxonomic affectation of next-generation sequencing reads based on their k-mers profile. We show that machine learning-based compositional approaches benefit from increasing the number of fragments sampled from reference genome to tune their parameters, up to a coverage of about 10, and from increasing the k-mer size to about 12. Tuning these models involves training a machine learning model on about 10 8 samples in 10 7 dimensions, which is out of reach of standard soft-wares but can be done efficiently with modern implementations for large-scale machine learning. The resulting models are competitive in terms of accuracy with well-established alignment tools for problems involving a small to moderate number of candidate species, and for reasonable amounts of sequencing errors. We show, however, that compositional approaches are still limited in their ability to deal with problems involving a greater number of species, and more sensitive to sequencing errors. We finally confirm that compositional approach achieve faster prediction times, with a gain of 3 to 15 times with respect to the BWA-MEM short read mapper, depending on the number of candidate species and the level of sequencing noise.

The infant airway microbiome in health and disease impacts later asthma development

The infant airway microbiome in health and disease impacts later asthma development

Shu Mei Teo, Danny Mok, Kym Pham, Merci Kusel, Michael Serralha, Niamh Troy, Barbara J Holt, Belinda J Hales, Michael L Walker, Elysia Hollams, Yury H Bochkov, Kristine Grindle, Sebastian L Johnston, James E Gern, Peter D Sly, Patrick G Holt, Kathryn E Holt, Michael Inouye
doi: http://dx.doi.org/10.1101/012070

The nasopharynx (NP) is a reservoir for microbes associated with acute respiratory illnesses (ARI). The development of asthma is initiated during infancy, driven by airway inflammation associated with infections. Here, we report viral and bacterial community profiling of NP aspirates across a birth cohort, capturing all lower respiratory illnesses during their first year. Most infants were initially colonized with Staphylococcus or Corynebacterium before stable colonization with Alloiococcus or Moraxella, with transient incursions of Streptococcus, Moraxella or Haemophilus marking virus-associated ARIs. Our data identify the NP microbiome as a determinant for infection spread to the lower airways, severity of accompanying inflammatory symptoms, and risk for future asthma development. Early asymptomatic colonization with Streptococcus was a strong asthma predictor, and antibiotic usage disrupted asymptomatic colonization patterns.

A robust statistical framework for reconstructing genomes from metagenomic data

A robust statistical framework for reconstructing genomes from metagenomic data

Dongwan Don Kang, Jeff Froula, Rob Egan, Zhong Wang
doi: http://dx.doi.org/10.1101/011460

We present software that reconstructs genomes from shotgun metagenomic sequences using a reference-independent approach. This method permits the identification of OTUs in large complex communities where many species are unknown. Binning reduces the complexity of a metagenomic dataset enabling many downstream analyses previously unavailable. In this study we developed MetaBAT, a robust statistical framework that integrates probabilistic distances of genome abundance with sequence composition for automatic binning. Applying MetaBAT to a human gut microbiome dataset identified 173 highly specific genomes bins including many representing previously unidentified species.

Resolving microbial microdiversity with high accuracy full length 16S rRNA Illumina sequencing

Resolving microbial microdiversity with high accuracy full length 16S rRNA Illumina sequencing
Catherine Burke, Aaron E Darling
doi: http://dx.doi.org/10.1101/010967

We describe a method for sequencing full-length 16S rRNA gene amplicons using the high throughput Illumina MiSeq platform. The resulting sequences have about 100-fold higher accuracy than standard Illumina reads and are chimera filtered using information from a single molecule dual tagging scheme that boosts the signal available for chimera detection. We demonstrate that the data provides fine scale phylogenetic resolution not available from Illumina amplicon methods targeting smaller variable regions of the 16S rRNA gene.

Virulence genes are a signature of the microbiome in the colorectal tumor microenvironment

Virulence genes are a signature of the microbiome in the colorectal tumor microenvironment

Michael B Burns, Joshua Lynch, Timothy K Starr, Dan Knights, Ran Blekhman
doi: http://dx.doi.org/10.1101/009431

Background The human gut microbiome is associated with the development of colon cancer, and recent studies have found changes in the composition of the microbial communities in cancer patients compared to healthy controls. However, host-bacteria interactions are mainly expected to occur in the cancer microenvironment, whereas current studies primarily use stool samples to survey the microbiome. Here, we highlight the major shifts in the colorectal tumor microbiome relative to that of matched normal colon tissue from the same individual, allowing us to survey the microbial communities at the tumor microenvironment, and provides intrinsic control for environmental and host genetic effects on the microbiome. Results We characterized the microbiome in 44 primary tumor and 44 patient-matched normal colon tissues. We find that tumors harbor distinct microbial communities compared to nearby healthy tissue. Our results show increased microbial diversity at the tumor microenvironment, with changes in the abundances of commensal and pathogenic bacterial taxa, including Fusobacterium and Providencia. While Fusobacteria has previously been implicated in CRC, Providencia is a novel tumor- associated agent, and has several features that make it a potential cancer driver, including a strong immunogenic LPS and an ability to damage colorectal tissue. Additionally, we identified a significant enrichment of virulence-associated genes in the colorectal cancer microenvironment. Conclusions This work identifies bacterial taxa significantly correlated with colorectal cancer, including a novel finding of an elevated abundance of Providencia in the tumor microenvironment. We also describe several metabolic pathways and enzymes differentially present in the tumor associated microbiome, and show that the bacterial genes in the tumor microenvironment are enriched for virulence associated genes from the aggregate microbial community. This virulence enrichment indicates that the microbiome likely plays an active role in colorectal cancer development and/or progression. These reuslts provide a starting point for future prognostic and therapeutic research with the potential to improve patient outcomes.

Bayesian mixture analysis for metagenomic community profiling.

Bayesian mixture analysis for metagenomic community profiling.

Sofia Morfopoulou, Vincent Plagnol

Deep sequencing of clinical samples is now an established tool for the detection of infectious pathogens, with direct medical applications. The large amount of data generated provides an opportunity to detect species even at very low levels, provided that computational tools can effectively interpret potentially complex metagenomic mixtures. Data interpretation is complicated by the fact that short sequencing reads can match multiple organisms and by the lack of completeness of existing databases, in particular for viral pathogens. This interpretation problem can be formulated statistically as a mixture model, where the species of origin of each read is missing, but the complete knowledge of all species present in the mixture helps with the individual reads assignment. Several analytical tools have been proposed to approximately solve this computational problem. Here, we show that the use of parallel Monte Carlo Markov chains (MCMC) for the exploration of the species space enables the identification of the set of species most likely to contribute to the mixture. The added accuracy comes at a cost of increased computation time. Our approach is useful for solving complex mixtures involving several related species. We designed our method specifically for the analysis of deep transcriptome sequencing datasets and with a particular focus on viral pathogen detection, but the principles are applicable more generally to all types of metagenomics mixtures. The code is available on github (http://github.com/smorfopoulou/metaMix) and the process is currently being implemented in a user friendly R package (metaMix, to be submitted to CRAN).