# The new science of metagenomics and the challenges of its use in both developed and developing countries

The new science of metagenomics and the challenges of its use in both developed and developing countries
Edi Prifti (MICA), Jean-Daniel Zucker (MSI, UMMISCO, Nutriomique, Eq. 7)
(Submitted on 10 May 2013)

Our view of the microbial world and its impact on human health is changing radically with the ability to sequence uncultured or unculturable microbes sampled directly from their habitats, ability made possible by fast and cheap next generation sequencing technologies. Such recent developments represents a paradigmatic shift in the analysis of habitat biodiversity, be it the human, soil or ocean microbiome. We review here some research examples and results that indicate the importance of the microbiome in our lives and then discus some of the challenges faced by metagenomic experiments and the subsequent analysis of the generated data. We then analyze the economic and social impact on genomic-medicine and research in both developing and developed countries. We support the idea that there are significant benefits in building capacities for developing high-level scientific research in metagenomics in developing countries. Indeed, the notion that developing countries should wait for developed countries to make advances in science and technology that they later import at great cost has recently been challenged.

# Abundance-weighted phylogenetic diversity measures distinguish microbial community states and are robust to sampling depth

Abundance-weighted phylogenetic diversity measures distinguish microbial community states and are robust to sampling depth
Connor O. McCoy, Frederick A. Matsen IV
(Submitted on 1 May 2013)

In microbial ecology studies, the most commonly used ways of investigating alpha (within-sample) diversity are either to apply count-only measures such as Simpson’s index to Operational Taxonomic Unit (OTU) groupings, or to use classical phylogenetic diversity (PD), which is not abundance-weighted. Although alpha diversity measures that use abundance information in a phylogenetic framework do exist, but are not widely used within the microbial ecology community. The performance of abundance-weighted phylogenetic diversity measures compared to classical discrete measures has not been explored, and the behavior of these measures under rarefaction (sub-sampling) is not yet clear. In this paper we compare the ability of various alpha diversity measures to distinguish between different community states in the human microbiome for three different data sets. We also present and compare a novel one-parameter family of alpha diversity measures, BWPD_\theta, that interpolates between classical phylogenetic diversity (PD) and an abundance-weighted extension of PD. Additionally, we examine the sensitivity of these phylogenetic diversity measures to sampling, via computational experiments and by deriving a closed form solution for the expectation of phylogenetic quadratic entropy under re-sampling. In all three of the datasets considered, an abundance-weighted measure is the best differentiator between community states. OTU-based measures, on the other hand, are less effective in distinguishing community types. In addition, abundance-weighted phylogenetic diversity measures are less sensitive to differing sampling intensity than their unweighted counterparts. Based on these results we encourage the use of abundance-weighted phylogenetic diversity measures, especially for cases such as microbial ecology where species delimitation is difficult.

# Distilled Single Cell Genome Sequencing and De Novo Assembly for Sparse Microbial Communities

Distilled Single Cell Genome Sequencing and De Novo Assembly for Sparse Microbial Communities

Zeinab Taghavi, Narjes S. Movahedi, Sorin Draghici, Hamidreza Chitsaz
(Submitted on 1 May 2013)

Identification of all species in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier as the number of different species is usually much smaller than the number of cells. Here, we present a novel divide-and-conquer method to sequence and de novo assemble the genomes of all of the different species present in a microbial sample with a sequencing cost and computational complexity proportional to the number of species, not the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide-and-conquer method successfully reduces the cost of sequencing in comparison with the naive exhaustive approach.

# Reducing assembly complexity of microbial genomes with single-molecule sequencing

Reducing assembly complexity of microbial genomes with single-molecule sequencing
Sergey Koren, Gregory P Harhay, Timothy PL Smith, James L Bono, Dayna M Harhay, D. Scott Mcvey, Diana Radune, Nicholas H Bergman, Adam M Phillippy
(Submitted on 13 Apr 2013)

Background: The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem.
Results: To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These assemblies are also comparable in accuracy to hybrid assemblies including second-generation data.
Conclusions: Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to below \$2,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of complete genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.

# Robust estimation of microbial diversity in theory and in practice

Robust estimation of microbial diversity in theory and in practice
Bart Haegeman, Jérôme Hamelin, John Moriarty, Peter Neal, Jonathan Dushoff, Joshua S. Weitz
(Submitted on 15 Feb 2013)

Quantifying diversity is of central importance for the study of structure, function and evolution of microbial communities. The estimation of microbial diversity has received renewed attention with the advent of large-scale metagenomic studies. Here, we consider what the diversity observed in a sample tells us about the diversity of the community being sampled. First, we argue that one cannot reliably estimate the absolute and relative number of microbial species present in a community without making unsupported assumptions about species abundance distributions. The reason for this is that sample data do not contain information about the number of rare species in the tail of species abundance distributions. We illustrate the difficulty in comparing species richness estimates by applying Chao’s estimator of species richness to a set of in silico communities: they are ranked incorrectly in the presence of large numbers of rare species. Next, we extend our analysis to a general family of diversity metrics (“Hill diversities”), and construct lower and upper estimates of diversity values consistent with the sample data. The theory generalizes Chao’s estimator, which we retrieve as the lower estimate of species richness. We show that Shannon and Simpson diversity can be robustly estimated for the in silico communities. We analyze nine metagenomic data sets from a wide range of environments, and show that our findings are relevant for empirically-sampled communities. Hence, we recommend the use of Shannon and Simpson diversity rather than species richness in efforts to quantify and compare microbial diversity.

# Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets

Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets
Adina Chuang Howe, Jason Pell, Rosangela Canino-Koning, Rachel Mackelprang, Susannah Tringe, Janet Jansson, James M. Tiedje, C. Titus Brown
(Submitted on 1 Dec 2012)

Sequencing errors and biases in metagenomic datasets affect coverage-based assemblies and are often ignored during analysis. Here, we analyze read connectivity in metagenomes and identify the presence of problematic and likely a-biological connectivity within metagenome assembly graphs. Specifically, we identify highly connected sequences which join a large proportion of reads within each real metagenome. These sequences show position-specific bias in shotgun reads, suggestive of sequencing artifacts, and are only minimally incorporated into contigs by assembly. The removal of these sequences prior to assembly results in similar assembly content for most metagenomes and enables the use of graph partitioning to decrease assembly memory and time requirements.

# Our paper: Bacterial diversity associated with Drosophila in the laboratory and in the natural environment

For next guest post Fabian Staubach and Dmitri Petrov write about their paper (along with coauthors) Bacterial diversity associated with Drosophila in the laboratory and in the natural environment arXived here.

Host associated bacterial communities are ubiquitous, have a variety of effects on the host phenotype and play a role in host adaptation to new environments. Some clear examples of such adaptations are known but generally these are ancient associations between host and symbiont, such as the association between aphids and the obligate symbiotic bacterium Buchnera that provides the aphid with essential amino acids or the association between bee wolfs and Streptomyces that protects bee wolf larvae from fungal infections. We are investigating the potential of bacterial communities to underlie short-term adaptation using adaptation of D. melanogaster and D. simulans to different fruit as a study system.

As the first step we profiled the diversity and composition of bacterial communities associated with Drosophila across multiple species, habitats, and substrates. We amplified and sequenced a region of the bacterial ribosomal DNA from whole body fly samples using 454 technology. We focused on comparing the bacterial communities of the sibling species D. melanogaster and D. simulans in the lab and in an ecologically and evolutionary relevant setting: their natural environment. In most cases we were able to study flies from these two species collected by aspiration from the same fruit. We also included nine different species spanning the Drosophila phylogeny to test whether phylogenetic distance and distance between bacterial communities are correlated.

We show that natural bacterial communities associated with Drosophila contain more different bacterial taxa than previously thought. Comparison to a mammalian fecal data set reveals that although mammal-associated bacterial communities are more diverse on average, the diversity of some mammalian fecal samples lies within the range or is even lower than that of the Drosophila samples we analyzed. This finding is interesting because it has been a matter of debate whether organisms with an adaptive immune system can in general accommodate higher bacterial diversity. By comparing the bacterial communities of D. melanogaster and D. simulans collected directly from different natural food substrates we demonstrate that bacterial communities differ primarily between substrates and very weakly among fly species.

We find acetic acid bacteria of the genera Acetobacter and Gluconobacter to be associated with all wild-caught flies constituting two thirds of all sequences. Acetic acid bacteria oxidize sugars and ethanol to acetic acid and are known to be directly involved in the development of a specific process of decay called ‘sour rot’ on grapes that causes wine spoilage. There is previous evidence that Drosophila is vital for the dispersal of acetic acid bacteria among rotting fruit: grapes covered with nets in the field do acquire yeasts, but no acetic acid bacteria and acetic acid bacteria thrive on grapes only when flies are present. At the same time, Acetobacter has been shown to promote Drosophila larval growth and shorten development time under certain nutritional conditions. Therefore, we argue that the relationship between Acetobacteraceae and Drosophila is likely mutualistic.

Individual natural fly samples are dominated by bacteria known to be pathogenic in Drosophila, such as Enterococcus and Providencia. These bacteria are known to reach very high cell counts during systemic infections of Drosophila and we believe that the inclusion of systemically infected flies in these samples is the most likely explanation for the observed pattern. The observation that it is in principle possible to identify potential candidate pathogens in natural populations using standard, high throughput microbial community screening techniques opens up opportunities for large scale epidemiological studies in nature and can help to identify candidate pathogenic bacterial species for further investigation in the laboratory.

In the laboratory, fly associated bacterial communities are similar irrespective of phylogenetic distance between fly species, suggesting that host genetic factors either play a minor role in shaping the bacterial communities associated with Drosophila or, as suggested by the difference of bacterial communities between D. melanogster and D. simulans in the wild, require natural conditions to manifest themselves. High variability of Drosophila bacterial communities within and between laboratories is a potential source of experimental noise when studying phenotypic variation. The impact of microbes on Drosophila phenotypes ranges from influencing growth to cold tolerance and it is hard to imagine traits that are not subject in principle to alteration by microbes.

We hope that our data will serve as a solid foundation for future studies especially for the growing community of scientists that are interested in the microbial communities that are associated with Drosophila.

Fabian Staubach and Dmitri Petrov

# Bacterial diversity associated with Drosophila in the laboratory and in the natural environment

Bacterial diversity associated with Drosophila in the laboratory and in the natural environment
Fabian Staubach, John F. Baines, Sven Kuenzel, Elisabeth M. Bik, Dmitri A. Petrov
(Submitted on 14 Nov 2012)

All higher organisms are associated with bacterial communities. Bacteria have a range of effects on their metazoan hosts from being indispensable for survival to being lethal pathogens. Because bacteria have phenotypic effects on their hosts, they can also be involved in host adaptation to the environment. The fruit fly Drosophila is a classic model organism to study adaptation as well as the relationship between genetic variation and phenotypes. Recently, Drosophila has received attention in immunology and studies of host-microbe interaction. Although bacterial communities associated with Drosophila might be important for many aspects of Drosophila biology, little is known about their diversity and composition or the factors shaping these communities. We used 454-based sequencing of a variable region of the bacterial 16S ribosomal gene to characterize the bacterial communities associated with wild and laboratory Drosophila isolates. In order to specifically investigate effects of food source and host species on bacterial communities, we analyzed samples from wild Drosophila melanogaster and D. simulans flies collected from a variety of natural substrates, as well as from adults and larvae of nine laboratory-reared Drosophila species. We find substantial variation of bacterial communities within and between laboratories that could interfere with phenotype studies. We show that bacterial communities associated with wild-caught Drosophila contain more bacterial species than laboratory-raised flies, but that they are on average less diverse than vertebrate communities. The natural Drosophila-associated microbiota appears to be predominantly shaped by food substrate with an additional but smaller effect of host species identity.

# A 454 survey of the community composition and core microbiome of the common bed bug, Cimex lectularius, reveals significant microbial community structure across an urban landscape

A 454 survey of the community composition and core microbiome of the common bed bug, Cimex lectularius, reveals significant microbial community structure across an urban landscape

Matthew Meriweather, Sara Matthews, Rita Rio, Regina S Baucom
(Submitted on 13 Oct 2012)

Elucidating the spatial dynamic and core constituents of the microbial communities found in association with arthropod hosts is of crucial importance for insects that may vector human or agricultural pathogens. The hematophagous Cimex lectularius, known as the common bed bug, has made a recent resurgence in North America, as well as worldwide, potentially owing to increased travel and resistance to insecticides. A comprehensive survey of the bed bug microbiome has not been performed to date, nor has an assessment of the spatial dynamics of its microbiome. Here we present a survey of bed bug microbial communities by amplifying the V4-V6 hypervariable region of the 16S rDNA gene region followed by 454 Titanium sequencing using 31 individuals from eight natural populations collected from residences in Cincinnati, OH. Across all samples, 97% of the microbial community is made up of two dominant OTUs identified as the $\alpha$-proteobacterium Wolbachia and an unnamed $\gamma$-proteobacterium from the Enterobacteriaceae. Microbial communities varied among host populations for measures of community diversity and exhibited significant population structure. We also uncovered a strong negative correlation in the abundance of the two dominant OTUs, suggesting they may fulfill similar roles as nutritional mutualists. This broad survey represents the most comprehensive assessment, to date, of the microbes that associate with bed bugs, and uncovers evidence for potential antagonism between the two dominant members of the bed bug microbiome.