Population genetic analyses of metagenomes reveal extensive strain-level variation in prevalent human-associated bacteria

Population genetic analyses of metagenomes reveal extensive strain-level variation in prevalent human-associated bacteria

Stephen Nayfach, Katherine S Pollard
doi: http://dx.doi.org/10.1101/031757

Deep sequencing has the potential to shed light on the functional and phylogenetic heterogeneity of microbial populations in the environment. Here we present PhyloCNV, an integrated computational pipeline for quantifying species abundance and strain-level genomic variation from shotgun metagenomes. Our method leverages a comprehensive database of >30,000 reference genomes which we accurately clustered into species groups using a panel of universal-single-copy genes. Given a shotgun metagenome, PhyloCNV will rapidly and automatically identify gene copy number variants and single-nucleotide variants present in abundant bacterial species. We applied PhyloCNV to >500 faecal metagenomes from the United States, Europe, China, Peru, and Tanzania and present the first global analysis of strain-level variation and biogeography in the human gut microbiome. On average there is 8.5x more nucleotide diversity of strains between different individuals than within individuals, with elevated strain-level diversity in hosts from Peru and Tanzania that live rural lifestyles. For many, but not all common gut species, a significant proportion of inter-sample strain-level genetic diversity is explained by host geography. Eubacterium rectale, for example, has a highly structured population that tracks with host country, while strains of Bacteroides uniformis and other species are structured independently of their hosts. Finally, we discovered that the gene content of some bacterial strains diverges at short evolutionary timescales during which few nucleotide variants accumulate. These findings shed light onto the recent evolutionary history of microbes in the human gut and highlight the extensive differences in the gene content of closely related bacterial strains. PhyloCNV is freely available at: https://github.com/snayfach/PhyloCNV.

On the Origins and Control of Community Types in the Human Microbiome

On the Origins and Control of Community Types in the Human Microbiome

Travis E. Gibson, Amir Bashan, Hong-Tai Cao, Scott T. Weiss, Yang-Yu Liu
(Submitted on 17 Jun 2015)

Microbiome-based stratification of healthy individuals into compositional categories, referred to as “community types”, holds promise for drastically improving personalized medicine. Despite this potential, the existence of community types and the degree of their distinctness have been highly debated. Here we adopted a dynamic systems approach and found that heterogeneity in the interspecific interactions or the presence of strongly interacting species is sufficient to explain community types, independent of the topology of the underlying ecological network. By controlling the presence or absence of these strongly interacting species we can steer the microbial ecosystem to any desired community type. This open-loop control strategy still holds even when the community types are not distinct but appear as dense regions within a continuous gradient. This finding can be used to develop viable therapeutic strategies for shifting the microbial composition to a healthy configuration

Large-scale Machine Learning for Metagenomics Sequence Classification

Large-scale Machine Learning for Metagenomics Sequence Classification

Kévin Vervier (CBIO), Pierre Mahé, Maud Tournoud, Jean-Baptiste Veyrieras, Jean-Philippe Vert (CBIO)
(Submitted on 26 May 2015)

Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Due to the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasonable computing requirements. While standard alignment-based methods provide state-of-the-art performance, compositional approaches that assign a taxonomic class to a DNA read based on the k-mers it contains have the potential to provide faster solutions. In this work, we investigate the potential of modern, large-scale machine learning implementations for taxonomic affectation of next-generation sequencing reads based on their k-mers profile. We show that machine learning-based compositional approaches benefit from increasing the number of fragments sampled from reference genome to tune their parameters, up to a coverage of about 10, and from increasing the k-mer size to about 12. Tuning these models involves training a machine learning model on about 10 8 samples in 10 7 dimensions, which is out of reach of standard soft-wares but can be done efficiently with modern implementations for large-scale machine learning. The resulting models are competitive in terms of accuracy with well-established alignment tools for problems involving a small to moderate number of candidate species, and for reasonable amounts of sequencing errors. We show, however, that compositional approaches are still limited in their ability to deal with problems involving a greater number of species, and more sensitive to sequencing errors. We finally confirm that compositional approach achieve faster prediction times, with a gain of 3 to 15 times with respect to the BWA-MEM short read mapper, depending on the number of candidate species and the level of sequencing noise.

The infant airway microbiome in health and disease impacts later asthma development

The infant airway microbiome in health and disease impacts later asthma development

Shu Mei Teo, Danny Mok, Kym Pham, Merci Kusel, Michael Serralha, Niamh Troy, Barbara J Holt, Belinda J Hales, Michael L Walker, Elysia Hollams, Yury H Bochkov, Kristine Grindle, Sebastian L Johnston, James E Gern, Peter D Sly, Patrick G Holt, Kathryn E Holt, Michael Inouye
doi: http://dx.doi.org/10.1101/012070

The nasopharynx (NP) is a reservoir for microbes associated with acute respiratory illnesses (ARI). The development of asthma is initiated during infancy, driven by airway inflammation associated with infections. Here, we report viral and bacterial community profiling of NP aspirates across a birth cohort, capturing all lower respiratory illnesses during their first year. Most infants were initially colonized with Staphylococcus or Corynebacterium before stable colonization with Alloiococcus or Moraxella, with transient incursions of Streptococcus, Moraxella or Haemophilus marking virus-associated ARIs. Our data identify the NP microbiome as a determinant for infection spread to the lower airways, severity of accompanying inflammatory symptoms, and risk for future asthma development. Early asymptomatic colonization with Streptococcus was a strong asthma predictor, and antibiotic usage disrupted asymptomatic colonization patterns.

A robust statistical framework for reconstructing genomes from metagenomic data

A robust statistical framework for reconstructing genomes from metagenomic data

Dongwan Don Kang, Jeff Froula, Rob Egan, Zhong Wang
doi: http://dx.doi.org/10.1101/011460

We present software that reconstructs genomes from shotgun metagenomic sequences using a reference-independent approach. This method permits the identification of OTUs in large complex communities where many species are unknown. Binning reduces the complexity of a metagenomic dataset enabling many downstream analyses previously unavailable. In this study we developed MetaBAT, a robust statistical framework that integrates probabilistic distances of genome abundance with sequence composition for automatic binning. Applying MetaBAT to a human gut microbiome dataset identified 173 highly specific genomes bins including many representing previously unidentified species.

Resolving microbial microdiversity with high accuracy full length 16S rRNA Illumina sequencing

Resolving microbial microdiversity with high accuracy full length 16S rRNA Illumina sequencing
Catherine Burke, Aaron E Darling
doi: http://dx.doi.org/10.1101/010967

We describe a method for sequencing full-length 16S rRNA gene amplicons using the high throughput Illumina MiSeq platform. The resulting sequences have about 100-fold higher accuracy than standard Illumina reads and are chimera filtered using information from a single molecule dual tagging scheme that boosts the signal available for chimera detection. We demonstrate that the data provides fine scale phylogenetic resolution not available from Illumina amplicon methods targeting smaller variable regions of the 16S rRNA gene.

Virulence genes are a signature of the microbiome in the colorectal tumor microenvironment

Virulence genes are a signature of the microbiome in the colorectal tumor microenvironment

Michael B Burns, Joshua Lynch, Timothy K Starr, Dan Knights, Ran Blekhman
doi: http://dx.doi.org/10.1101/009431

Background The human gut microbiome is associated with the development of colon cancer, and recent studies have found changes in the composition of the microbial communities in cancer patients compared to healthy controls. However, host-bacteria interactions are mainly expected to occur in the cancer microenvironment, whereas current studies primarily use stool samples to survey the microbiome. Here, we highlight the major shifts in the colorectal tumor microbiome relative to that of matched normal colon tissue from the same individual, allowing us to survey the microbial communities at the tumor microenvironment, and provides intrinsic control for environmental and host genetic effects on the microbiome. Results We characterized the microbiome in 44 primary tumor and 44 patient-matched normal colon tissues. We find that tumors harbor distinct microbial communities compared to nearby healthy tissue. Our results show increased microbial diversity at the tumor microenvironment, with changes in the abundances of commensal and pathogenic bacterial taxa, including Fusobacterium and Providencia. While Fusobacteria has previously been implicated in CRC, Providencia is a novel tumor- associated agent, and has several features that make it a potential cancer driver, including a strong immunogenic LPS and an ability to damage colorectal tissue. Additionally, we identified a significant enrichment of virulence-associated genes in the colorectal cancer microenvironment. Conclusions This work identifies bacterial taxa significantly correlated with colorectal cancer, including a novel finding of an elevated abundance of Providencia in the tumor microenvironment. We also describe several metabolic pathways and enzymes differentially present in the tumor associated microbiome, and show that the bacterial genes in the tumor microenvironment are enriched for virulence associated genes from the aggregate microbial community. This virulence enrichment indicates that the microbiome likely plays an active role in colorectal cancer development and/or progression. These reuslts provide a starting point for future prognostic and therapeutic research with the potential to improve patient outcomes.