The new science of metagenomics and the challenges of its use in both developed and developing countries

The new science of metagenomics and the challenges of its use in both developed and developing countries
Edi Prifti (MICA), Jean-Daniel Zucker (MSI, UMMISCO, Nutriomique, Eq. 7)
(Submitted on 10 May 2013)

Our view of the microbial world and its impact on human health is changing radically with the ability to sequence uncultured or unculturable microbes sampled directly from their habitats, ability made possible by fast and cheap next generation sequencing technologies. Such recent developments represents a paradigmatic shift in the analysis of habitat biodiversity, be it the human, soil or ocean microbiome. We review here some research examples and results that indicate the importance of the microbiome in our lives and then discus some of the challenges faced by metagenomic experiments and the subsequent analysis of the generated data. We then analyze the economic and social impact on genomic-medicine and research in both developing and developed countries. We support the idea that there are significant benefits in building capacities for developing high-level scientific research in metagenomics in developing countries. Indeed, the notion that developing countries should wait for developed countries to make advances in science and technology that they later import at great cost has recently been challenged.

SARS-CoV originated from bats in 1998 and may still exist in humans

SARS-CoV originated from bats in 1998 and may still exist in humans
Ailin Tao, Yuyi Huang, Peilu Li1, Jun Liu, Nanshan Zhong, Chiyu Zhang
(Submitted on 13 May 2013)

SARS-CoV is believed to originate from civets and was thought to have been eliminated as a threat after the 2003 outbreak. Here, we show that human SARS-CoV (huSARS-CoV) originated directly from bats, rather than civets, by a cross-species jump in 1991, and formed a human-adapted strain in 1998. Since then huSARS-CoV has evolved further into highly virulent strains with genotype T and a 29-nt deletion mutation, and weakly virulent strains with genotype C but without the 29-nt deletion. The former can cause pneumonia in humans and could be the major causative pathogen of the SARS outbreak, whereas the latter might not cause pneumonia in humans, but evolved the ability to co-utilize civet ACE2 as an entry receptor, leading to interspecies transmission between humans and civets. Three crucial time points – 1991, for the cross-species jump from bats to humans; 1998, for the formation of the human-adapted SARS-CoV; and 2003, when there was an outbreak of SARS in humans – were found to associate with anomalously low annual precipitation and high temperatures in Guangdong. Anti-SARS-CoV sero-positivity was detected in 20% of all the samples tested from Guangzhou children who were born after 2005, suggesting that weakly virulent huSARS-CoVs might still exist in humans. These existing but undetected SARS-CoVs have a large potential to evolve into highly virulent strains when favorable climate conditions occur, highlighting a potential risk for the reemergence of SARS.

The deleterious mutation load is insensitive to recent population history

The deleterious mutation load is insensitive to recent population history
Yuval B. Simons, Michael C. Turchin, Jonathan K. Pritchard, Guy Sella
(Submitted on 9 May 2013)

Human populations have undergone dramatic changes in population size in the past 100,000 years, including a severe bottleneck of non-African populations and recent explosive population growth. There is currently great interest in how these demographic events may have affected the burden of deleterious mutations in individuals and the allele frequency spectrum of disease mutations in populations. Here we use population genetic models to show that–contrary to previous conjectures–recent human demography has likely had very little impact on the average burden of deleterious mutations carried by individuals. This prediction is supported by exome sequence data showing that African American and European American individuals carry very similar burdens of damaging mutations. We next consider whether recent population growth has increased the importance of very rare mutations in complex traits. Our analysis predicts that for most classes of disease variants, rare alleles are unlikely to contribute a large fraction of the total genetic variance, and that the impact of recent growth is likely to be modest. However, for diseases that have a direct impact on fitness, strongly deleterious rare mutations likely do play important roles, and the impact of very rare mutations will be far greater as a result of recent growth. In summary, demographic history has dramatically impacted patterns of variation in different human populations, but these changes have likely had little impact on either genetic load or on the importance of rare variants for most complex traits.

Statistical Physics of Evolutionary Trajectories on Fitness Landscapes

Statistical Physics of Evolutionary Trajectories on Fitness Landscapes
Michael Manhart, Alexandre V. Morozov
(Submitted on 6 May 2013)

Random walks on multidimensional nonlinear landscapes are of interest in many areas of science and engineering. In particular, properties of adaptive trajectories on fitness landscapes determine population fates and thus play a central role in evolutionary theory. The topography of fitness landscapes and its effect on evolutionary dynamics have been extensively studied in the literature. We will survey the current research knowledge in this field, focusing on a recently developed systematic approach to characterizing path lengths, mean first-passage times, and other statistics of the path ensemble. This approach, based on general techniques from statistical physics, is applicable to landscapes of arbitrary complexity and structure. It is especially well-suited to quantifying the diversity of stochastic trajectories and repeatability of evolutionary events. We demonstrate this methodology using a biophysical model of protein evolution that describes how proteins maintain stability while evolving new functions.

Response to No gene-specific optimization of mutation rate in Escherichia coli

Response to No gene-specific optimization of mutation rate in Escherichia coli
Inigo Martincorena, Nicholas M. Luscombe
(Submitted on 7 May 2013)

In a letter published in Molecular Biology Evolution [10], Chen and Zhang argue that the variation of the mutation rate along the Escherichia coli genome that we recently reported [3] cannot be evolutionarily optimised. To support this claim they first attempt to calculate the selective advantage of a local reduction in the mutation rate and conclude that it is not strong enough to be favoured by selection. Second, they analyse the distribution of 166 mutations from a wild-type E. coli K12 MG1655 strain and 1,346 mutations from a repair-deficient strain, and claim to find a positive association between transcription and mutation rate rather than the negative association that we reported. Here we respond to this communication. Briefly, we explain how the long-standing theory of mutation-modifier alleles supports the evolution of local mutation rates within a genome by mechanisms acting on sufficiently large regions of a genome, which is consistent with our original observations [3,4]. We then explain why caution must be exercised when comparing mutations from repair deficient strains to data from wild-type strains, as different mutational processes dominate these conditions. Finally, a reanalysis of the data used by Zhang and Chen with an alternative expression dataset reveals that their conclussions are unreliable.

Inference in Kingman’s Coalescent with Particle Markov Chain Monte Carlo Method

Inference in Kingman’s Coalescent with Particle Markov Chain Monte Carlo Method
Yifei Chen, Xiaohui Xie
(Submitted on 3 May 2013)

We propose a new algorithm to do posterior sampling of Kingman’s coalescent, based upon the Particle Markov Chain Monte Carlo methodology. Specifically, the algorithm is an instantiation of the Particle Gibbs Sampling method, which alternately samples coalescent times conditioned on coalescent tree structures, and tree structures conditioned on coalescent times via the conditional Sequential Monte Carlo procedure. We implement our algorithm as a C++ package, and demonstrate its utility via a parameter estimation task in population genetics on both single- and multiple-locus data. The experiment results show that the proposed algorithm performs comparable to or better than several well-developed methods.

Isoform reconstruction using short RNA-Seq reads by maximum likelihood is NP-hard

Isoform reconstruction using short RNA-Seq reads by maximum likelihood is NP-hard
Tianyang Li, Rui Jiang, Xuegong Zhang
(Submitted on 4 May 2013)

Maximum likelihood is a popular technique for isoform reconstruction. Here, we show that isoform reconstruction using short RNA-Seq reads by maximum likelihood is NP-hard.

The role of twitter in the life cycle of a scientific publication

The role of twitter in the life cycle of a scientific publication
Emily S. Darling, David Shiffman, Isabelle M. Côté, Joshua A. Drew
(Submitted on 2 May 2013)

Twitter is a micro-blogging social media platform for short messages that can have a long-term impact on how scientists create and publish ideas. We investigate the usefulness of twitter in the development and distribution of scientific knowledge. At the start of the life cycle of a scientific publication, twitter provides a large virtual department of colleagues that can help to rapidly generate, share and refine new ideas. As ideas become manuscripts, twitter can be used as an informal arena for the pre-review of works in progress. Finally, tweeting published findings can communicate research to a broad audience of other researchers, decision makers, journalists and the general public that can amplify the scientific and social impact of publications. However, there are limitations, largely surrounding issues of intellectual property and ownership, inclusiveness and misrepresentations of science sound bites. Nevertheless, we believe twitter is a useful social media tool that can provide a valuable contribution to scientific publishing in the 21st century.

Abundance-weighted phylogenetic diversity measures distinguish microbial community states and are robust to sampling depth

Abundance-weighted phylogenetic diversity measures distinguish microbial community states and are robust to sampling depth
Connor O. McCoy, Frederick A. Matsen IV
(Submitted on 1 May 2013)

In microbial ecology studies, the most commonly used ways of investigating alpha (within-sample) diversity are either to apply count-only measures such as Simpson’s index to Operational Taxonomic Unit (OTU) groupings, or to use classical phylogenetic diversity (PD), which is not abundance-weighted. Although alpha diversity measures that use abundance information in a phylogenetic framework do exist, but are not widely used within the microbial ecology community. The performance of abundance-weighted phylogenetic diversity measures compared to classical discrete measures has not been explored, and the behavior of these measures under rarefaction (sub-sampling) is not yet clear. In this paper we compare the ability of various alpha diversity measures to distinguish between different community states in the human microbiome for three different data sets. We also present and compare a novel one-parameter family of alpha diversity measures, BWPD_\theta, that interpolates between classical phylogenetic diversity (PD) and an abundance-weighted extension of PD. Additionally, we examine the sensitivity of these phylogenetic diversity measures to sampling, via computational experiments and by deriving a closed form solution for the expectation of phylogenetic quadratic entropy under re-sampling. In all three of the datasets considered, an abundance-weighted measure is the best differentiator between community states. OTU-based measures, on the other hand, are less effective in distinguishing community types. In addition, abundance-weighted phylogenetic diversity measures are less sensitive to differing sampling intensity than their unweighted counterparts. Based on these results we encourage the use of abundance-weighted phylogenetic diversity measures, especially for cases such as microbial ecology where species delimitation is difficult.

Our paper: Integrating influenza antigenic dynamics with molecular evolution

This guest post is by Trevor Bedford (@trvrb) on his paper (along with coauthors): Bedford et al. Integrating influenza antigenic dynamics with molecular evolution arXived here.

The influenza virus shows a remarkable capacity to evolve to escape human immunity. Many other viruses, like measles, do not have this capacity. After infection with measles, a person gains life-long immunity to the virus, and hence measles has become constrained to be a childhood infection. Continual antigenic evolution in influenza necessitates frequent vaccine updates to provide sufficient protection to circulating strains.

Antigenic differences between strains are commonly quantified using the hemagglutination inhibition (HI) assay, which measures the ability of antibodies created against one strain to interfere with virus from another strain. The resulting HI data is represented as a sparse matrix of comparisons between viruses from strains A, B, C… and sera from strains X, Y, Z… Taken by itself, this matrix is difficult to work with. Experienced virologists can pick up the loss of reactivity between groups of viruses in the noisy HI data, but these patterns are not fully quantified.

In our new paper, available on the arXiv, we extend techniques of multidimensional scaling (MDS) pioneered by Derek Smith and colleagues for the analysis of influenza antigenic data. Here, we attempted to bring the MDS antigenic model into a fully Bayesian framework and refer to the revised technique as Bayesian MDS (BMDS). In this model, viruses and sera are represented as 2D coordinates on an antigenic map in which their pairwise distances yield expectations for the HI titers, with antigenically similar viruses lying close to one another and antigenically distant viruses lying far apart.

By placing antigenic cartography in a Bayesian context, we are able to integrate other data sources, most notably sequence data. In this case, genetic sequences provide an evolutionary tree relating virus strains and we assume that antigenic location evolves along this tree in a 2D diffusion process. This process imposes a prior on antigenic locations in which evolutionary similar viruses have a prior expectation of lying close to one another on the map. In the paper, we use this BMDS / diffusion model to investigate patterns of antigenic evolution in 4 circulating lineages of influenza and show that antigenic drift determines to a large degree incidence patterns across time and across lineages.

The paper is also up on GitHub, which I’ll keep updating as it goes through the review process. The BMDS model is implemented in the software package BEAST and is available in the latest source code. I hope to provide tutorials on running the BMDS model in the not-to-distant future.