Family-joining: A fast distance-based method for constructing generally labeled trees

Family-joining: A fast distance-based method for constructing generally labeled trees
Prabhav Kalaghatgi, Nico Pfeifer, Thomas Lengauer

The widely used model for evolutionary relationships is a bifurcating tree with all taxa/observations placed at the leaves. This is not appropriate for taxa that have been densely sampled across evolutionary time and may be in a direct ancestral relationship. In this paper, we present a fast distance-based agglomeration method called family-joining (FJ) for constructing so-called generally labeled trees in which taxa may be placed at internal vertices and the tree may contain polytomies. FJ constructs such trees on the basis of pairwise distances and a distance threshold. We tested two methods for threshold selection, FJ-BIC and FJ-CV, which minimize BIC and CV error, respectively. When compared with related methods on simulated data, FJ-BIC was among the best at reconstructing the correct tree across a wide range of simulation scenarios. FJ-BIC was applied to HIV sequences sampled from individuals involved in a known transmission chain. The FJ-BIC tree was found to be compatible with almost all transmission events. On average, internal branches in the FJ-BIC tree have higher bootstrap support than branches in the leaf-labeled bifurcating tree constructed using RAxML. 36% and 25% of the internal branches in the FJ-BIC tree and RAxML tree, respectively, have bootstrap support greater than 70%. To the best of our knowledge the method presented here is the first attempt at modeling the evolutionary relationships of densely sampled pathogens using generally labeled trees.

variancePartition: Interpreting drivers of variation in complex gene expression studies

variancePartition: Interpreting drivers of variation in complex gene expression studies

Gabriel E Hoffman, Eric E Schadt

Macroevolutionary trade-offs in plant-feeding insects

Macroevolutionary trade-offs in plant-feeding insects

Daniel Peterson, Nate B. Hardy, Benjamin B. Normark

Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

Po-E Li, Chien-Chi Lo, Joseph J. Anderson, Karen W. Davenport, Kimberly A. Bishop-Lilly, Yan Xu, Sanaa Ahmed, Shihai Feng, Vishwesh P. Mokashi, Patrick S. G. Chain

DNA Methylation profiles of diverse Brachypodium distachyon aligns with underlying genetic diversity

DNA Methylation profiles of diverse Brachypodium distachyon aligns with underlying genetic diversity

Steven R Eichten, Tim Stuart, Akanksha Srivastava, Ryan Lister, Justin O Borevitz

Assessing the relationship between height growth and molecular genetic variation in Douglas-fir (Pseudotsuga menziesii) provenances

Charalambos Neophytou, Anna-Maria Weisser, Daniel Landwehr, Muhidin Šeho, Ulrich Kohnle, Ingo Ensminger, Henning Wildhagen

MetaPalette: A K-mer painting approach for metagenomic taxonomic profiling and quantification of novel strain variation

MetaPalette: A K-mer painting approach for metagenomic taxonomic profiling and quantification of novel strain variation

David Koslicki, Daniel Falush

A simple, general result for the variance of substitution number in molecular evolution

A simple, general result for the variance of substitution number in molecular evolution
Bahram Houchmandzadeh, Marcel Vallade
(Submitted on 16 Feb 2016)

The number of substitutions (of nucleotides, amino acids, …) that take place during the evolution of a sequence is a stochastic variable of fundamental importance in the field of molecular evolution. Although the mean number of substitutions during molecular evolution of a sequence can be estimated for a given substitution model, no simple solution exists for the variance of this random variable. We show in this article that the computation of the variance is as simple as that of the mean number of substitutions for both short and long times. Apart from its fundamental importance, this result can be used to investigate the dispersion index R , i.e. the ratio of the variance to the mean substitution number, which is of prime importance in the neutral theory of molecular evolution. By investigating large classes of substitution models, we demonstrate that although R\ge1 , to obtain R significantly larger than unity necessitates in general additional hypotheses on the structure of the substitution model.

The hidden complexity of Mendelian traits across yeast natural populations

The hidden complexity of Mendelian traits across yeast natural populations

Jing Hou, Anastasie Sigwalt, David Pflieger, Jackson Peter, Jacky de Montigny, Maitreya Dunham, Joseph Schacherer

annotatr: Associating genomic regions with genomic annotations

annotatr: Associating genomic regions with genomic annotations

Raymond G Cavalcante, Maureen A Sartor