Sequencing ultra-long DNA molecules with the Oxford Nanopore MinION

Sequencing ultra-long DNA molecules with the Oxford Nanopore MinION

John M Urban, Jacob Bliss, Charles E Lawrence, Susan A Gerbi
doi: http://dx.doi.org/10.1101/019281

Oxford Nanopore Technologies’ nanopore sequencing device, the MinION, holds the promise of sequencing ultra-long DNA fragments >100kb. An obstacle to realizing this promise is delivering ultra-long DNA molecules to the nanopores. We present our progress in developing cost-effective ways to overcome this obstacle and our resulting MinION data, including multiple reads >100kb.

Ecological and evolutionary adaptations shape the gut microbiome of BaAka African rainforest hunter-gatherers

Ecological and evolutionary adaptations shape the gut microbiome of BaAka African rainforest hunter-gatherers
Andres Gomez , Klara Petrzelkova , Carl J Yeoman , Micahel B Burns , Katherine R Amato , Klara Vlckova , David Modry , Angelique Todd , Carolyn A Jost Robbinson , Melissa Remis , Manolito Torralba , Karen E Nelson , Franck Carbonero , H Rex Gaskins , Brenda A Wilson , Rebecca M Stumpf , Bryan A White , Steven R Leigh , Ran Blekhman
doi: http://dx.doi.org/10.1101/019232

The gut microbiome provides access to otherwise unavailable metabolic and immune functions, likely affecting mammalian fitness and evolution. To investigate how this microbial ecosystem impacts evolutionary adaptation of humans to particular habitats, we explore the gut microbiome and metabolome of the BaAka rainforest hunter-gatherers from Central Africa. The data demonstrate that the BaAka harbor a colonic ecosystem dominated by Prevotellaceae and other taxa likely related to an increased capacity to metabolize plant structural polysaccharides, phenolics, and lipids. A comparative analysis shows that the BaAka gut microbiome shares similar patterns with that of the Hadza, another hunter-gatherer population from Tanzania. Nevertheless, the BaAka harbor significantly higher bacterial diversity and pathogen load compared to the Hadza, as well as other Western populations. We show that the traits unique to the BaAka microbiome and metabolome likely reflect adaptations to hunter-gatherer lifestyles and particular subsistence patterns. We hypothesize that the observed increase in microbial diversity and potential pathogenicity in the BaAka microbiome has been facilitated by evolutionary adaptations in immunity genes, resulting in a more tolerant immune system.

Bayesian Nonparametric Inference of Population Size Changes from Sequential Genealogies

Bayesian Nonparametric Inference of Population Size Changes from Sequential Genealogies
Julia A Palacios , John Wakeley, Sohini Ramachandran
doi: http://dx.doi.org/10.1101/019216

Sophisticated inferential tools coupled with the coalescent model have recently emerged for estimating past population sizes from genomic data. Accurate methods are available for data from a single locus or from independent loci. Recent methods that model recombination require small sample sizes, make constraining assumptions about population size changes, and do not report measures of uncertainty for estimates. Here, we develop a Gaussian process-based Bayesian nonparametric method coupled with a sequentially Markov coalescent model which allows accurate inference of population sizes over time from a set of genealogies. In contrast to current methods, our approach considers a broad class of recombination events, including those that do not change local genealogies. We show that our method outperforms recent likelihood-based methods that rely on discretization of the parameter space. We illustrate the application of our method to multiple demographic histories, including population bottlenecks and exponential growth. In simulation, our Bayesian approach produces point estimates four times more accurate than maximum likelihood estimation (based on the sum of absolute differences between the truth and the estimated values). Further, our method’s credible intervals for population size as a function of time cover 90 percent of true values across multiple demographic scenarios, enabling formal hypothesis testing about population size differences over time. Using genealogies estimated with ARGweaver, we apply our method to European and Yoruban samples from the 1000 Genomes Project and confirm key known aspects of population size history over the past 150,000 years.

Near-optimal RNA-Seq quantification

Near-optimal RNA-Seq quantification
Nicolas Bray, Harold Pimentel, Páll Melsted, Lior Pachter
Subjects: Quantitative Methods (q-bio.QM); Computational Engineering, Finance, and Science (cs.CE); Data Structures and Algorithms (cs.DS); Genomics (q-bio.GN)

We present a novel approach to RNA-Seq quantification that is near optimal in speed and accuracy. Software implementing the approach, called kallisto, can be used to analyze 30 million unaligned RNA-Seq reads in less than 5 minutes on a standard laptop computer while providing results as accurate as those of the best existing tools. This removes a major computational bottleneck in RNA-Seq analysis.

Integration of experiments across diverse environments identifies the genetic determinants of variation in Sorghum bicolor seed element composition

Integration of experiments across diverse environments identifies the genetic determinants of variation in Sorghum bicolor seed element composition

Nadia Shakoor , Greg Ziegler , Brian P Dilkes , Zachary Brenton , Richard Boyles , Erin L Connolly , Stephen Kresovich , Ivan Baxter

Seedling establishment and seed nutritional quality require the sequestration of sufficient mineral nutrients. Identification of genes and alleles that modify element content in the grains of cereals, including Sorghum bicolor, is fundamental to developing breeding and selection methods aimed at increasing bioavailable mineral content and improving crop growth. We have developed a high throughput workflow for the simultaneous measurement of multiple elements in Sorghum seeds. We measured seed element levels in the genotyped Sorghum Association Panel (SAP), representing all major cultivated sorghum races from diverse geographic and climatic regions, and mapped alleles contributing to seed element variation across three environments by genome-wide association. We observed significant phenotypic and genetic correlation between several elements across multiple years and diverse environments. The power of combining high-precision measurements with genome wide association was demonstrated by implementing rank transformation and a multilocus mixed model (MLMM) to map alleles controlling 20 element traits, identifying 255 loci affecting the sorghum seed ionome. Sequence similarity to genes characterized in previous studies identified likely causative genes for the accumulation of zinc (Zn) manganese (Mn), nickel (Ni), calcium (Ca) and cadmium (Cd) in sorghum seed. In addition to strong candidates for these four elements, we provide a list of candidate loci for several other elements. Our approach enabled identification of SNPs in strong LD with causative polymorphisms that can be used directly in plant breeding and improvement.

Coalescent times and patterns of genetic diversity in species with facultative sex: effects of gene conversion, population structure and heterogeneity

Coalescent times and patterns of genetic diversity in species with facultative sex: effects of gene conversion, population structure and heterogeneity

Matthew Hartfield , Stephen I. Wright , Aneil F. Agrawal

Many diploid organisms undergo facultative sexual reproduction. However, little is currently known concerning the distribution of neutral genetic variation amongst facultative sexuals except in very simple cases. Understanding this distribution is important when making inferences about rates of sexual reproduction, effective population size and demographic history. Here, we extend coalescent theory in diploids with facultative sex to consider gene conversion, selfing, population subdivision, and temporal and spatial heterogeneity in rates of sex. In addition to analytical results for two-sample coalescent times, we outline a coalescent algorithm that accommodates the complexities arising from partial sex; this algorithm can be used to generate multi-sample coalescent distributions. A key result is that when sex is rare, gene conversion becomes a significant force in reducing diversity within individuals, which can remove genomic signatures of infrequent sex (the ‘Meselson Effect’) or entirely reverse the predictions. Our models offer improved methods for assessing the null model (I.e. neutrality) of patterns of molecular variation in facultative sexuals.

Bayesian Inference of Divergence Times and Feeding Evolution in Grey Mullets (Mugilidae)

Bayesian Inference of Divergence Times and Feeding Evolution in Grey Mullets (Mugilidae)

Francesco Santini , Michael R. May , Giorgio Carnevale , Brian R. Moore
doi: http://dx.doi.org/10.1101/019075

Grey mullets (Mugilidae, Ovalentariae) are coastal fishes found in near-shore environments of tropical, subtropical, and temperate regions within marine, brackish, and freshwater habitats throughout the world. This group is noteworthy both for the highly conserved morphology of its members—which complicates species identification and delimitation—and also for the uncommon herbivorous or detritivorous diet of most mullets. In this study, we first attempt to identify the number of mullet species, and then—for the resulting species—estimate a densely sampled time-calibrated phylogeny using three mitochondrial gene regions and three fossil calibrations. Our results identify two major subgroups of mullets that diverged in the Paleocene/Early Eocene, followed by an Eocene/Oligocene radiation across both tropical and subtropical habitats. We use this phylogeny to explore the evolution of feeding preference in mullets, which indicates multiple independent origins of both herbivorous and detritivorous diets within this group. We also explore correlations between feeding preference and other variables, including body size, habitat (marine, brackish, or freshwater), and geographic distribution (tropical, subtropical, or temperate). Our analyses reveal: (1) a positive correlation between trophic index and habitat (with herbivorous and/or detritivorous species predominantly occurring in marine habitats); (2) a negative correlation between trophic index and geographic distribution (with herbivorous species occurring predominantly in subtropical and temperate regions), and; (3) a negative correlation between body size and geographic distribution (with larger species occurring predominantly in subtropical and temperate regions).

Mitochondria, mutations and sex: a new hypothesis for the evolution of sex based on mitochondrial mutational erosion

Mitochondria, mutations and sex: a new hypothesis for the evolution of sex based on mitochondrial mutational erosion

Justin Havird , Matthew D Hall , Damian Dowling
doi: http://dx.doi.org/10.1101/019125

The evolution of sex in eukaryotes represents a paradox, given the “two-fold” fitness cost it incurs. We hypothesize that the mutational dynamics of the mitochondrial genome would have favoured the evolution of sexual reproduction. Mitochondrial DNA (mtDNA) exhibits a high mutation rate across most eukaryote taxa, and several lines of evidence suggest this high rate is an ancestral character. This seems inexplicable given mtDNA-encoded genes underlie the expression of life’s most salient functions, including energy conversion. We propose that negative metabolic effects linked to mitochondrial mutation accumulation would have invoked selection for sexual recombination between divergent host nuclear genomes in early eukaryote lineages. This would provide a mechanism by which recombinant host genotypes could be rapidly shuffled and screened for the presence of compensatory modifiers that offset mtDNA-induced harm. Under this hypothesis, recombination provides the genetic variation necessary for compensatory nuclear coadaptation to keep pace with mitochondrial mutation accumulation.

Long-term survival of duplicate genes despite absence of subfunctionalized expression.

Long-term survival of duplicate genes despite absence of subfunctionalized expression.

Xun Lan , Jonathan K Pritchard
doi: http://dx.doi.org/10.1101/019166

Gene duplication is a fundamental process in genome evolution. However, young duplicates are frequently degraded into pseudogenes by loss-of-function mutations. One standard model proposes that the main path for duplicate genes to avoid mutational destruction is by rapidly evolving subfunctionalized expression profiles. We examined this hypothesis using RNA-seq data from 46 human tissues. Surprisingly, we find that sub- or neofunctionalization of expression evolves very slowly, and is rare among duplications that arose within the placental mammals. Most mammalian duplicates are located in tandem and have highly correlated expression profiles, likely due to shared regulation, thus impeding subfunctionalization. Moreover, we also find that a large fraction of duplicate gene pairs exhibit a striking asymmetric pattern in which one gene has consistently higher expression. These asymmetrically expressed duplicates (AEDs) may persist for tens of millions of years, even though the lower-expressed copies tend to evolve under reduced selective constraint and are associated with fewer human diseases than their duplicate partners. We suggest that dosage-sharing of expression, rather than subfunctionalization, is more likely to be the initial factor enabling survival of duplicate gene pairs.

A basic mathematical model for the Lenski experiment and the deceleration of the relative fitness

A basic mathematical model for the Lenski experiment and the deceleration of the relative fitness
Adrián González Casanova, Noemi Kurt, Anton Wakolbinger, Linglong Yuan
Subjects: Probability (math.PR); Populations and Evolution (q-bio.PE)

The Lenski experiment investigates the long-term evolution of bacterial populations. Its design allows the direct comparison of the reproductive fitness of an evolved strain with its founder ancestor. It was observed by Wiser et al. (2013) that the mean fitness over time increases sublinearly, a behaviour which is commonly attributed to effects like clonal interference or epistasis. In this paper we present an individual-based probabilistic model that captures essential features of the design of the Lenski experiment. We assume that each beneficial mutation increases the individual reproduction rate by a fixed amount, which corresponds to the absence of epistasis in the continuous-time (intraday) part of the model, but leads to an epistatic effect in the discrete-time (interday) part of the model. Using an approximation by near-critical Galton-Watson processes, we prove that under some assumptions on the model parameters which exclude clonal interference, the relative fitness process converges, after suitable rescaling, in the large population limit to a power law function.