Population genetic analyses of metagenomes reveal extensive strain-level variation in prevalent human-associated bacteria

Population genetic analyses of metagenomes reveal extensive strain-level variation in prevalent human-associated bacteria

Stephen Nayfach, Katherine S Pollard
doi: http://dx.doi.org/10.1101/031757

Deep sequencing has the potential to shed light on the functional and phylogenetic heterogeneity of microbial populations in the environment. Here we present PhyloCNV, an integrated computational pipeline for quantifying species abundance and strain-level genomic variation from shotgun metagenomes. Our method leverages a comprehensive database of >30,000 reference genomes which we accurately clustered into species groups using a panel of universal-single-copy genes. Given a shotgun metagenome, PhyloCNV will rapidly and automatically identify gene copy number variants and single-nucleotide variants present in abundant bacterial species. We applied PhyloCNV to >500 faecal metagenomes from the United States, Europe, China, Peru, and Tanzania and present the first global analysis of strain-level variation and biogeography in the human gut microbiome. On average there is 8.5x more nucleotide diversity of strains between different individuals than within individuals, with elevated strain-level diversity in hosts from Peru and Tanzania that live rural lifestyles. For many, but not all common gut species, a significant proportion of inter-sample strain-level genetic diversity is explained by host geography. Eubacterium rectale, for example, has a highly structured population that tracks with host country, while strains of Bacteroides uniformis and other species are structured independently of their hosts. Finally, we discovered that the gene content of some bacterial strains diverges at short evolutionary timescales during which few nucleotide variants accumulate. These findings shed light onto the recent evolutionary history of microbes in the human gut and highlight the extensive differences in the gene content of closely related bacterial strains. PhyloCNV is freely available at: https://github.com/snayfach/PhyloCNV.

Evolutionary analysis across mammals reveals distinct classes of long noncoding RNAs

Evolutionary analysis across mammals reveals distinct classes of long noncoding RNAs

Jenny Chen, Alexander A. Shishkin, Xiaopeng Zhu, Sabah Kadri, Itay Maza, Jacob H Hanna, Aviv Regev, Manuel Garber

An evaluation of transcriptome-based exon capture for frog phylogenomics across multiple scales of divergence (Class: Amphibia, Order: Anura)

An evaluation of transcriptome-based exon capture for frog phylogenomics across multiple scales of divergence (Class: Amphibia, Order: Anura)

Daniel Portik, Lydia Smith, Ke Bi

Rapid Genotype Refinement for Whole-Genome Sequencing Data using Multi-Variate Normal Distributions

Rapid Genotype Refinement for Whole-Genome Sequencing Data using Multi-Variate Normal Distributions

Rudy Arthur, Jared O’Connell, Ole Schulz-Trieglaff, Anthony J Cox

Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes.

Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes.

Tim Putman, Sebastian Burgstaller, Andra Waagmeester, Chunlei Wu, Andrew I Su, Benjamin Good

Surprisingly weak coordination between leaf structure and function among closely-related tomato species

Surprisingly weak coordination between leaf structure and function among closely-related tomato species

Christopher D Muir, Miquel Ángel Conesa, Emilio Roldán, Arántzazu Molins, Jeroni Galmés

Scan-o-matic: high-resolution microbial phenomics at a massive scale

Scan-o-matic: high-resolution microbial phenomics at a massive scale

Martin Zackrisson, Johan Hallin, Lars-Göran Ottosson, Peter Dahl, Esteban Fernandez-Parada, Erik Ländström, Luciano Fernandez-Ricaud, Petra Kaferle, Andreas Skyman, Stig Omholt, Uros Petrovic, Jonas Warringer, Anders Blomberg

Are all global alignment algorithms and implementations correct?

Are all global alignment algorithms and implementations correct?

Tomáš Flouri, Kassian Kobert, Torbjørn Rognes, Alexandros Stamatakis

Characterization of expression quantitative trait loci in extensively phenotyped pedigrees ascertained for bipolar disorder

Characterization of expression quantitative trait loci in extensively phenotyped pedigrees ascertained for bipolar disorder

Christine Peterson, Susan Service, Anna Jasinska, Fuying Gao, Ivette Zelaya, Terri Teshiba, Carrie Bearden, Victor Reus, Gabriel Macaya, Carlos López-Jaramillo, Marina Bogomolov, Yoav Benjamini, Eleazar Eskin, Giovanni Coppola, Nelson Freimer, Chiara Sabatti

Human knockouts in a cohort with a high rate of consanguinity

Human knockouts in a cohort with a high rate of consanguinity

Danesh Saleheen, Pradeep Natarajan, Wei Zhao, Asif Rasheed, Sumeet Khetarpal, Hong-Hee Won, Konrad J Karczewski, Anne H ODonnell-Luria, Kaitlin E Samocha, Namrata Gupta, Mozzam Zaidi, Maria Samuel, Atif Imran, Shahid Abbas, Faisal Majeed, Madiha Ishaq, Saba Akhtar, Kevin Trindade, Megan Mucksavage, Nadeem Qamar, Khan S Zaman, Zia Yaqoob, Tahir Saghir, Syed NH Rizvi, Anis Memon, Nadeem H Mallick, Mohammad Ishaq, Syed Z Rasheed, Fazal ur Rehman Memon, Khalid Mahmood, Naveeduddin Ahmed, Ron Do, Daniel G MacArthur, Stacey Gabriel, Eric S Lander, Mark J Daly, Philippe Frossard, John Danesh, Daniel J Rader, Sekar Kathiresan

A major goal of biomedicine is to understand the function of every gene in the human genome. Null mutations can disrupt both copies of a given gene in humans and phenotypic analysis of such ‘human knockouts’ can provide insight into gene function. To date, comprehensive analysis of genes knocked out in humans has been limited by the fact that null mutations are infrequent in the general population and so, observing an individual homozygous null for a given gene is exceedingly rare. However, consanguineous unions are more likely to result in offspring who carry homozygous null mutations. In Pakistan, consanguinity rates are notably high. Here, we sequenced the protein-coding regions of 7,078 adult participants living in Pakistan and performed phenotypic analysis to identify homozygous null individuals and to understand consequences of complete gene disruption in humans. We enumerated 36,850 rare (<1 % minor allele frequency) null mutations. These homozygous null mutations led to complete inactivation of 961 genes in at least one participant. Homozygosity for null mutations at APOC3 was associated with absent plasma apolipoprotein C-III levels; at PLAG27, with absent enzymatic activity of soluble lipoprotein-associated phospholipase A2; at CYP2F1, with higher plasma interleukin-8 concentrations; and at either A3GALT2 or NRG4, with markedly reduced plasma insulin C-peptide concentrations. After physiologic challenge with oral fat, APOC3 knockouts displayed marked blunting of the usual post-prandial rise in plasma triglycerides compared to wild-type family members. These observations provide a roadmap to understand the consequences of complete disruption of a large fraction of genes in the human genome.