RAD sequencing enables unprecedented phylogenetic resolution and objective species delimitation in recalcitrant divergent taxa

RAD sequencing enables unprecedented phylogenetic resolution and objective species delimitation in recalcitrant divergent taxa

Santiago Herrera, Timothy M. Shank
doi: http://dx.doi.org/10.1101/019745

Species delimitation is problematic in many taxa due to the difficulty of evaluating predictions from species delimitation hypotheses, which chiefly relay on subjective interpretations of morphological observations and/or DNA sequence data. This problem is exacerbated in recalcitrant taxa for which genetic resources are scarce and inadequate to resolve questions regarding evolutionary relationships and uniqueness. In this case study we demonstrate the empirical utility of restriction site associated DNA sequencing (RAD-seq) by unambiguously resolving phylogenetic relationships among recalcitrant octocoral taxa with divergences greater than 80 million years. We objectively infer robust species boundaries in the genus Paragorgia, which contains some of the most important ecosystem engineers in the deep-sea, by testing alternative taxonomy-guided or unguided species delimitation hypotheses using the Bayes factors delimitation method (BFD*) with genome-wide single nucleotide polymorphism data. We present conclusive evidence rejecting the current morphological species delimitation model for the genus Paragorgia and indicating the presence of cryptic species boundaries associated with environmental variables. We argue that the suitability limits of RAD-seq for phylogenetic inferences in divergent taxa cannot be assessed in terms of absolute time, but depend on taxon-specific factors such as mutation rate, generation time and effective population size. We show that classic morphological taxonomy can greatly benefit from integrative approaches that provide objective tests to species delimitation hypothesis. Our results pave the way for addressing further questions in biogeography, species ranges, community ecology, population dynamics, conservation, and evolution in octocorals and other marine taxa.

Character trees from transcriptome data: origin and individuation of morphological characters and the so-called “species signal”

Character trees from transcriptome data: origin and individuation of morphological characters and the so-called “species signal”

Jacob Musser, Gunter Wagner
doi: http://dx.doi.org/10.1101/019380

We elaborate a framework for investigating the evolutionary history of morphological characters. We argue that morphological character trees generated from transcriptomes provide a useful tool for identifying causal gene expression differences underlying the development and evolution of morphological characters. They also enable rigorous testing of different models of morphological character evolution and origination, including the hypothesis that characters originate via divergence of repeated ancestral characters. Finally, morphological character trees provide evidence that character transcriptomes undergo concerted evolution. We argue that concerted evolution of transcriptomes can explain the so-called “species-specific clustering” found in several recent comparative transcriptome studies. The species signal is the phenomenon that transcriptomes cluster by species rather than character type, even though the characters are older than the respective species. We suggest that concerted gene expression evolution results from mutations that alter gene regulatory network interactions shared by the characters under comparison. Thus, character trees generated from transcriptomes allow us to investigate the variational independence, or individuation, of morphological characters at the level of genetic programs.

ReproPhylo: An Environment for Reproducible Phylogenomics

ReproPhylo: An Environment for Reproducible Phylogenomics

Amir Szitenberg, Max John, Mark L Blaxter, David H Lunt
doi: http://dx.doi.org/10.1101/019349

The reproducibility of experiments is key to the scientific process, and particularly necessary for accurate reporting of analyses in data-rich fields such as phylogenomics. We present ReproPhylo, a phylogenomic analysis environment developed to ensure experimental reproducibility, to facilitate the handling of large-scale data, and to assist methodological experimentation. Reproducibility, and instantaneous repeatability, is built in to the ReproPhylo system, and does not require user intervention or configuration because it stores the experimental workflow as a single, serialized Python object containing explicit provenance and environment information. This ‘single file’ approach ensures the persistence of provenance across iterations of the analysis, with changes automatically managed by the version control program Git. ReproPhylo produces an extensive human-readable report, and generates a comprehensive experimental archive file, both of which are suitable for submission with publications. The system facilitates thorough experimental exploration of both parameters and data. ReproPhylo is a platform independent CC0 python module, and is easily installed as a Docker image, with an Jupyter GUI, or as a slimmer version in a Galaxy distribution.

Capturing heterotachy through multi-gamma site models

Capturing heterotachy through multi-gamma site models

Remco Bouckaert , Peter Lockhart
doi: http://dx.doi.org/10.1101/018101

Most methods for performing a phylogenetic analysis based on sequence alignments of gene data assume that the mechanism of evolution is constant through time. It is recognised that some sites do evolve somewhat faster than others, and this can be captured using a (gamma) rate heterogeneity model. Further, some species have shorter replication times than others, and this results in faster rates of substitution in some lineages. This feature of lineage specific rate variation can be captured to some extent, by using relaxed clock models. However, it is also clear that there are additional poorly characterised features of sequence data that can sometimes lead to extreme differences in lineage specific rates. This variation is poorly captured by constant time reversible substitution models. The significance of extreme lineage specific rate differences is that they lead both to errors in reconstructing evolutionary relationships as well as biased estimates for the age of ancestral nodes. We propose a new model that allows gamma rate heterogeneity to change on branches, thus offering a more realistic model of sequence evolution. It adds negligible computational cost to likelihood calculations. We illustrate its effectiveness with an example of green algae and land-plants. For many real world data sets, we find a much better fit with multi-gamma sites models as well as substantial differences in ancestral node date estimates.

Phylogenetic analysis supports a link between DUF1220 domain number and primate brain expansion

Phylogenetic analysis supports a link between DUF1220 domain number and primate brain expansion

Fabian Zimmer , Stephen H Montgomery
doi: http://dx.doi.org/10.1101/018077

The expansion of DUF1220 domain copy number during human evolution is a dramatic example of rapid and repeated domain duplication. However, the phenotypic relevance of DUF1220 dosage is unknown. Although patterns of expression, homology and disease associations suggest a role in cortical development, this hypothesis has not been robustly tested using phylogenetic methods. Here, we estimate DUF1220 domain counts across 12 primate genomes using a nucleotide Hidden Markov Model. We then test a series of hypotheses designed to examine the potential evolutionary significance of DUF1220 copy number expansion. Our results suggest a robust association with brain size, and more specifically neocortex volume. In contradiction to previous hypotheses we find a strong association with postnatal brain development, but not with prenatal brain development. Our results provide further evidence of a conserved association between specific loci and brain size across primates, suggesting human brain evolution occurred through a continuation of existing processes.

Phylogenomic analyses support traditional relationships within Cnidaria

Phylogenomic analyses support traditional relationships within Cnidaria

Felipe Zapata , Freya E Goetz , Stephen A Smith , Mark Howison , Stefan Siebert , Samuel Church , Steven M Sanders , Cheryl Lewis Ames , Catherine S McFadden , Scott C France , Marymegan Daly , Allen G Collins , Steven HD Haddock , Casey Dunn , Paulyn Cartwright
doi: http://dx.doi.org/10.1101/017632

Cnidaria, the sister group to Bilateria, is a highly diverse group of animals in terms of morphology, lifecycles, ecology, and development. How this diversity originated and evolved is not well understood because phylogenetic relationships among major cnidarian lineages are unclear, and recent studies present contrasting phylogenetic hypotheses. Here, we use transcriptome data from 15 newly-sequenced species in combination with 26 publicly available genomes and transcriptomes to assess phylogenetic relationships among major cnidarian lineages. Phylogenetic analyses using different partition schemes and models of molecular evolution, as well as topology tests for alternative phylogenetic relationships, support the monophyly of Medusozoa, Anthozoa, Octocorallia, Hydrozoa, and a clade consisting of Staurozoa, Cubozoa, and Scyphozoa. Support for the monophyly of Hexacorallia is weak due to the equivocal position of Ceriantharia. Taken together, these results further resolve deep cnidarian relationships, largely support traditional phylogenetic views on relationships, and provide a historical framework for studying the evolutionary processes involved in one of the most ancient animal radiations.

How complexity originates: The evolution of animal eyes

How complexity originates: The evolution of animal eyes
Todd H Oakley , Daniel I Speiser
doi: http://dx.doi.org/10.1101/017129

Learning how complex traits like eyes originate is fundamental for understanding evolution. Here, we first sketch historical perspectives on trait origins and argue that new technologies offer key new insights. Next, we articulate four open questions about trait origins. To address them, we define a research program to break complex traits into components and study the individual evolutionary histories of those parts. By doing so, we can learn when the parts came together and perhaps understand why they stayed together. We apply the approach to five structural innovations critical for complex eyes, reviewing the history of the parts of each of those innovations. Photoreceptors evolved within animals by bricolage, recombining genes that originated far earlier. Multiple genes used in eyes today had ancestral roles in stress responses. We hypothesize that photo-stress could have increased the chance those genes were expressed together in places on animals where light was abundant.

Calculating the Human Mutation Rate by Using a NUMT from the Early Oligocene

Calculating the Human Mutation Rate by Using a NUMT from the Early Oligocene
Ian Logan
doi: http://dx.doi.org/10.1101/016428

As the number of whole genomes available for study increases, so also does the opportunity to find unsuspected features hidden within our genetic code. One such feature allows for an estimate of the Human Mutation Rate in human chromosomes to be made. A NUMT is a small fragment of the mitochondrial DNA that enters the nucleus of a cell, gets captured by a chromosome and thereafter passed on from generation to generation. Over the millions of years of evolution, this unexpected phenomenon has happened many times. But it is usually very difficult to be able to say just when a NUMT might have been created. However, this paper presents evidence to show that for one particular NUMT the date of formation was around 29 million ago, which places the event in the Early Oligocene; when our ancestors were small monkey-like creatures. So now all of us carry this NUMT in each of our cells as do Old World Monkeys, the Great Apes and our nearest relations, the Chimpanzees. The estimate of the Human Mutation obtained by the method outlined here gives a value which is higher than has been generally found; but this new value perhaps only applies to non-coding regions of the Human genome where there is little, if any, selection pressure against new mutations.

Detecting hidden diversification shifts in models of trait-dependent speciation and extinction

Detecting hidden diversification shifts in models of trait-dependent speciation and extinction
Jeremy M Beaulieu , Brian C O’Meara
doi: http://dx.doi.org/10.1101/016386

The distribution of diversity can vary considerably from clade to clade. Attempts to understand these patterns often employ state speciation and extinction models to determine whether the evolution of a particular novel trait has increased speciation rates and/or decreased their extinction rates. It is still unclear, however, whether these models are uncovering important drivers of diversification, or whether they are simply pointing to more complex patterns involving many unmeasured and co-distributed factors. Here we describe an extension to the popular state speciation and extinction models that specifically accounts for the presence of unmeasured factors that could impact diversification rates estimated for the states of any observed trait. Specifically, our model, which we refer to as HiSSE (Hidden State Speciation and Extinction), assumes that related to each observed state in the model are “hidden” states that exhibit potentially distinct diversification dynamics and transition rates than the observed states in isolation. Under rigorous simulation tests and when applied to empirical data, we find that HiSSE performs reasonably well, and can at least detect net diversification rate differences between observed and hidden states. We also discuss the remaining issues with state speciation and extinction models in general, and the important ways in which HiSSE provides a more nuanced understanding of trait-dependent diversification.

PoMo: An Allele Frequency-based Approach for Species Tree Estimation

PoMo: An Allele Frequency-based Approach for Species Tree Estimation
Nicola De Maio , Dominik Schrempf , Carolin Kosiol
doi: http://dx.doi.org/10.1101/016360

Incomplete lineage sorting can cause incongruencies of the overall species-level phylogenetic tree with the phylogenetic trees for individual genes or genomic segments. If these incongruencies are not accounted for, it is possible to incur several biases in species tree estimation. Here, we present a simple maximum likelihood approach that accounts for ancestral variation and incomplete lineage sorting. We use a POlymorphisms-aware phylogenetic MOdel (PoMo) that we have recently shown to efficiently estimate mutation rates and fixation biases from within and between-species variation data. We extend this model to perform efficient estimation of species trees. We test the performance of PoMo in several different scenarios of incomplete lineage sorting using simulations and compare it with existing methods both in accuracy and computational speed. In contrast to other approaches, our model does not use coalescent theory but is allele-frequency based. We show that PoMo is well suited for genome-wide species tree estimation and that on such data it is more accurate than previous approaches.