Statistically-Consistent k-mer Methods for Phylogenetic Tree Reconstruction

Statistically-Consistent k-mer Methods for Phylogenetic Tree Reconstruction
Elizabeth S. Allman, John A. Rhodes, Seth Sullivant

Frequencies of k-mers in sequences are sometimes used as a basis for inferring phylogenetic trees without first obtaining a multiple sequence alignment. We show that a standard approach of using the squared-Euclidean distance between k-mer vectors to approximate a tree metric can be statistically inconsistent. To remedy this, we derive model-based distance corrections for orthologous sequences without gaps, which lead to consistent tree inference. The identifiability of model parameters from k-mer frequencies is also studied. Finally, we report simulations showing the corrected distance out-performs many other k-mer methods, even when sequences are generated with an insertion and deletion process. These results have implications for multiple sequence alignment as well, since k-mer methods are usually the first step in constructing a guide tree for such algorithms.

The single-species metagenome: subtyping Staphylococcus aureus core genome sequences from shotgun metagenomic data

The single-species metagenome: subtyping Staphylococcus aureus core genome sequences from shotgun metagenomic data

Sandeep J Joseph, Ben Li, Robert A Petit, Zhaohui Qin, Lyndsey Darrow, Timothy D Read

chopBAI: BAM index reduction solves I/O bottlenecks in the joint analysis of large sequencing cohorts

chopBAI: BAM index reduction solves I/O bottlenecks in the joint analysis of large sequencing cohorts

Birte Kehr, Páll Melsted

What’s in my pot? Real-time species identification on the MinION

What’s in my pot? Real-time species identification on the MinION

Sissel Juul, Fernando Izquierdo, Adam Hurst, Xiaoguang Dai, Amber Wright, Eugene Kulesha, Roger Pettett, Daniel J Turner

Using genotype data to distinguish pleiotropy from heterogeneity: deciphering coheritability in autoimmune and neuropsychiatric diseases

Using genotype data to distinguish pleiotropy from heterogeneity: deciphering coheritability in autoimmune and neuropsychiatric diseases

Buhm Han, Jennie G Pouget, Kamil Slowikowski, Eli Stahl, Cue Hyunkyu Lee, Dorothee Diogo, Xinli Hu, Yu Rang Park, Eunji Kim, Peter K Gregersen, Solbritt Rantapaa Dahqvist, Jane Worthington, Steve Eyre, Lars Klareskog, Tom Huizinga, Wei-Min Chen, Suna Onengut-Gumuscu, Stephen S Rich, Major Depressive Disorder Working Group of the PGC, Naomi Wray, Soumya Raychaudhuri

Common and phylogenetically widespread coding for peptides by bacterial small RNAs

Common and phylogenetically widespread coding for peptides by bacterial small RNAs

Robin C Friedman, Stefan Kalkhof, Olivia Doppelt-Azeroual, Stephan Mueller, Martina Chovancova, Martin von Bergen, Benno Schwikowski

The game of survival: Sexual evolution in dynamic environments

The game of survival: Sexual evolution in dynamic environments
Ruta Mehta, Ioannis Panageas, Georgios Piliouras, Prasad Tetali, Vijay V. Vazirani

Evolution is a complex algorithmic solution to life’s most pressing challenge, that of survival. It is a mixture of numerous textbook optimization techniques. Natural selection, the preferential replication ofthe fittest, encodes the multiplicative weights update algorithm, which in static environments is tantamount to exponential growth for the best solution. Sex can be interpreted as a game between different agents/genes with identical interests, maximizing the fitness of the individual. Mutation forces the exploration of consistently suboptimal solutions. Are all of these mechanisms necessary to ensure for survival? Also, how is it that despite their contradictory character (e.g., selection versus mutation) they do not cancel each other out? We address these questions by extending classic evolutionary models to allow for a dynamically changing environment. Sexual selection is well suited for static environments where we show that it converges polynomially fast to monomorphic populations. Mutations make the difference in dynamic environments. Without them species become extinct as they do not have the flexibility to recover fast given environmental change. On the other hand, we show that with mutation, as long as the rate of change of the environment is not too fast, long term survival is possible. Finally, mutation does not cancel the role of selection in static environments. Convergence remains guaranteed and only the level of polymorphism of the equilibria is affected. Our techniques quantify exploration-exploitation tradeoffs in time evolving non-convex optimization problems which could be of independent interest.

Quantifying Reticulation in Phylogenetic Complexes Using Homology

Quantifying Reticulation in Phylogenetic Complexes Using Homology
Kevin Emmett, Raul Rabadan

Reticulate evolutionary processes result in phylogenetic histories that cannot be modeled using a tree topology. Here, we apply methods from topological data analysis to molecular sequence data with reticulations. Using a simple example, we demonstrate the correspondence between nontrivial higher homology and reticulate evolution. We discuss the sensitivity of the standard filtration and show cases where reticulate evolution can fail to be detected. We introduce an extension of the standard framework and define the median complex as a construction to recover signal of the frequency and scale of reticulate evolution by inferring and imputing putative ancestral states. Finally, we apply our methods to two datasets from phylogenetics. Our work expands on earlier ideas of using topology to extract important evolutionary features from genomic data.

Whole genome duplication in coast redwood (Sequoia sempervirens) and its implications for explaining the rarity of polyploidy in conifers

Alison Dawn Scott, Noah Stenz, David Baum

Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads

Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads

Ivan Sovic, Kresimir Krizanovic, Karolj Skala, Mile Sikic