Some mathematical tools for the Lenski experiment

Some mathematical tools for the Lenski experiment
Bernard Ycart (LJK), Agnès Hamon (LJK), Joël Gaffé (LAPM), Dominique Schneider (LAPM)
(Submitted on 2 Oct 2013)

The Lenski experiment is a long term daily reproduction of Escherichia coli, that has evidenced phenotypic and genetic evolutions along the years. Some mathematical models, that could be usefull in understanding the results of that experiment, are reviewed here: stochastic and deterministic growth, mutation appearance and fixation, competition of species.

Waste Not, Want Not: Why Rarefying Microbiome Data is Inadmissible

Waste Not, Want Not: Why Rarefying Microbiome Data is Inadmissible
Paul J. McMurdie, Susan Holmes
(Submitted on 1 Oct 2013)

The interpretation of count data originating from the current generation of DNA sequencing platforms requires special attention. In particular, the per-sample library sizes often vary by orders of magnitude from the same sequencing run, and the counts are overdispersed relative to a simple Poisson model These challenges can be addressed using an appropriate mixture model that simultaneously accounts for library size differences and biological variability. This approach is already well-characterized and implemented for RNA-Seq data in R packages such as edgeR and DESeq.
We use statistical theory, extensive simulations, and empirical data to show that variance stabilizing normalization using a mixture model like the negative binomial is appropriate for microbiome count data. In simulations detecting differential abundance, normalization procedures based on a Gamma-Poisson mixture model provided systematic improvement in performance over crude proportions or rarefied counts — both of which led to a high rate of false positives. In simulations evaluating clustering accuracy, we found that the rarefying procedure discarded samples that were nevertheless accurately clustered by alternative methods, and that the choice of minimum library size threshold was critical in some settings, but with an optimum that is unknown in practice. Techniques that use variance stabilizing transformations by modeling microbiome count data with a mixture distribution, such as those implemented in edgeR and DESeq, substantially improved upon techniques that attempt to normalize by rarefying or crude proportions. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.

Identical inferences about correlated evolution arise from ancestral state reconstruction and independent contrasts

Identical inferences about correlated evolution arise from ancestral state reconstruction and independent contrasts
Michael G. Elliot
(Submitted on 30 Sep 2013)

Inferences about the evolution of continuous traits based on reconstruction of ancestral states has often been considered more error-prone than analysis of independent contrasts. Here we show that both methods in fact yield identical estimators for the correlation coefficient and regression gradient of correlated traits, indicating that reconstructed ancestral states are a valid source of information about correlated evolution. We show that the independent contrast associated with a pair of sibling nodes on a phylogenetic tree can be expressed in terms of the maximum likelihood ancestral state function at those nodes and their common parent. This expression gives rise to novel formulae for independent contrasts for any model of evolution admitting of a local likelihood function. We thus derive new formulae for independent contrasts applicable to traits evolving under directional drift, and use simulated data to show that these directional contrasts provide better estimates of evolutionary model parameters than standard independent contrasts, when traits in fact evolve with a directional tendency.

Integrating diverse datasets improves developmental enhancer prediction

Integrating diverse datasets improves developmental enhancer prediction
Genevieve D. Erwin, Rebecca M. Truty, Dennis Kostka, Katherine S. Pollard, John A. Capra
(Submitted on 27 Sep 2013)

Gene-regulatory enhancers have been identified by many lines of evidence, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a novel method for predicting developmental enhancers and their tissue specificity. EnhancerFinder uses a two-step multiple-kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and thousands of diverse functional genomics datasets from a variety of cell types and developmental stages. We trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser, in contrast to histone mark or sequence-based enhancer definitions commonly used. We comprehensively evaluated EnhancerFinder, and found that our integrative approach improves enhancer prediction accuracy over previous approaches that consider a single type of data. Our evaluation highlights the importance of considering information from many tissues when predicting specific types of enhancers. We find that VISTA enhancers active in embryonic heart are easier to predict than enhancers active in several other tissues due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and hits from genome-wide association studies. We demonstrate the utility of our enhancer predictions by identifying and validating a novel cranial nerve enhancer in the ZEB2 locus. Our genome-wide developmental enhancer predictions will be freely available as a UCSC Genome Browser track.

Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication

Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication
Carlos Nossa, Paul Havlak, Jia-Xing Yue, Jie Lv, Kim Vincent, H Jane Brockmann, Nicholas H Putnam
(Submitted on 28 Sep 2013)

Horseshoe crabs are marine arthropods with a fossil record extending back approximately 450 million years. They exhibit remarkable morphological stability over their long evolutionary history, retaining a number of ancestral arthropod traits, and are often cited as examples of “living fossils.” As arthropods, they belong to the Ecdysozoa}, an ancient super-phylum whose sequenced genomes (including insects and nematodes) have thus far shown more divergence from the ancestral pattern of eumetazoan genome organization than cnidarians, deuterostomes, and lophotrochozoans. However, much of ecdysozoan diversity remains unrepresented in comparative genomic analyses. Here we use a new strategy of combined de novo assembly and genetic mapping to examine the chromosome-scale genome organization of the Atlantic horseshoe crab Limulus polyphemus. We constructed a genetic linkage map of this 2.7 Gbp genome by sequencing the nuclear DNA of 34 wild-collected, full-sibling embryos and their parents at a mean redundancy of 1.1x per sample. The map includes 84,307 sequence markers and 5,775 candidate conserved protein coding genes. Comparison to other metazoan genomes shows that the L. polyphemus genome preserves ancestral bilaterian linkage groups, and that a common ancestor of modern horseshoe crabs underwent one or more ancient whole genome duplications (WGDs) ~ 300 MYA, followed by extensive chromosome fusion.

Most viewed on Haldane’s Sieve: September 2013

The most viewed preprints on Haldane’s Sieve this month were: