Capturing heterotachy through multi-gamma site models

Capturing heterotachy through multi-gamma site models

Remco Bouckaert , Peter Lockhart
doi: http://dx.doi.org/10.1101/018101

Most methods for performing a phylogenetic analysis based on sequence alignments of gene data assume that the mechanism of evolution is constant through time. It is recognised that some sites do evolve somewhat faster than others, and this can be captured using a (gamma) rate heterogeneity model. Further, some species have shorter replication times than others, and this results in faster rates of substitution in some lineages. This feature of lineage specific rate variation can be captured to some extent, by using relaxed clock models. However, it is also clear that there are additional poorly characterised features of sequence data that can sometimes lead to extreme differences in lineage specific rates. This variation is poorly captured by constant time reversible substitution models. The significance of extreme lineage specific rate differences is that they lead both to errors in reconstructing evolutionary relationships as well as biased estimates for the age of ancestral nodes. We propose a new model that allows gamma rate heterogeneity to change on branches, thus offering a more realistic model of sequence evolution. It adds negligible computational cost to likelihood calculations. We illustrate its effectiveness with an example of green algae and land-plants. For many real world data sets, we find a much better fit with multi-gamma sites models as well as substantial differences in ancestral node date estimates.

Phylogenetic analysis supports a link between DUF1220 domain number and primate brain expansion

Phylogenetic analysis supports a link between DUF1220 domain number and primate brain expansion

Fabian Zimmer , Stephen H Montgomery
doi: http://dx.doi.org/10.1101/018077

The expansion of DUF1220 domain copy number during human evolution is a dramatic example of rapid and repeated domain duplication. However, the phenotypic relevance of DUF1220 dosage is unknown. Although patterns of expression, homology and disease associations suggest a role in cortical development, this hypothesis has not been robustly tested using phylogenetic methods. Here, we estimate DUF1220 domain counts across 12 primate genomes using a nucleotide Hidden Markov Model. We then test a series of hypotheses designed to examine the potential evolutionary significance of DUF1220 copy number expansion. Our results suggest a robust association with brain size, and more specifically neocortex volume. In contradiction to previous hypotheses we find a strong association with postnatal brain development, but not with prenatal brain development. Our results provide further evidence of a conserved association between specific loci and brain size across primates, suggesting human brain evolution occurred through a continuation of existing processes.

Phylogenomic analyses support traditional relationships within Cnidaria

Phylogenomic analyses support traditional relationships within Cnidaria

Felipe Zapata , Freya E Goetz , Stephen A Smith , Mark Howison , Stefan Siebert , Samuel Church , Steven M Sanders , Cheryl Lewis Ames , Catherine S McFadden , Scott C France , Marymegan Daly , Allen G Collins , Steven HD Haddock , Casey Dunn , Paulyn Cartwright
doi: http://dx.doi.org/10.1101/017632

Cnidaria, the sister group to Bilateria, is a highly diverse group of animals in terms of morphology, lifecycles, ecology, and development. How this diversity originated and evolved is not well understood because phylogenetic relationships among major cnidarian lineages are unclear, and recent studies present contrasting phylogenetic hypotheses. Here, we use transcriptome data from 15 newly-sequenced species in combination with 26 publicly available genomes and transcriptomes to assess phylogenetic relationships among major cnidarian lineages. Phylogenetic analyses using different partition schemes and models of molecular evolution, as well as topology tests for alternative phylogenetic relationships, support the monophyly of Medusozoa, Anthozoa, Octocorallia, Hydrozoa, and a clade consisting of Staurozoa, Cubozoa, and Scyphozoa. Support for the monophyly of Hexacorallia is weak due to the equivocal position of Ceriantharia. Taken together, these results further resolve deep cnidarian relationships, largely support traditional phylogenetic views on relationships, and provide a historical framework for studying the evolutionary processes involved in one of the most ancient animal radiations.

How complexity originates: The evolution of animal eyes

How complexity originates: The evolution of animal eyes
Todd H Oakley , Daniel I Speiser
doi: http://dx.doi.org/10.1101/017129

Learning how complex traits like eyes originate is fundamental for understanding evolution. Here, we first sketch historical perspectives on trait origins and argue that new technologies offer key new insights. Next, we articulate four open questions about trait origins. To address them, we define a research program to break complex traits into components and study the individual evolutionary histories of those parts. By doing so, we can learn when the parts came together and perhaps understand why they stayed together. We apply the approach to five structural innovations critical for complex eyes, reviewing the history of the parts of each of those innovations. Photoreceptors evolved within animals by bricolage, recombining genes that originated far earlier. Multiple genes used in eyes today had ancestral roles in stress responses. We hypothesize that photo-stress could have increased the chance those genes were expressed together in places on animals where light was abundant.

Calculating the Human Mutation Rate by Using a NUMT from the Early Oligocene

Calculating the Human Mutation Rate by Using a NUMT from the Early Oligocene
Ian Logan
doi: http://dx.doi.org/10.1101/016428

As the number of whole genomes available for study increases, so also does the opportunity to find unsuspected features hidden within our genetic code. One such feature allows for an estimate of the Human Mutation Rate in human chromosomes to be made. A NUMT is a small fragment of the mitochondrial DNA that enters the nucleus of a cell, gets captured by a chromosome and thereafter passed on from generation to generation. Over the millions of years of evolution, this unexpected phenomenon has happened many times. But it is usually very difficult to be able to say just when a NUMT might have been created. However, this paper presents evidence to show that for one particular NUMT the date of formation was around 29 million ago, which places the event in the Early Oligocene; when our ancestors were small monkey-like creatures. So now all of us carry this NUMT in each of our cells as do Old World Monkeys, the Great Apes and our nearest relations, the Chimpanzees. The estimate of the Human Mutation obtained by the method outlined here gives a value which is higher than has been generally found; but this new value perhaps only applies to non-coding regions of the Human genome where there is little, if any, selection pressure against new mutations.

Detecting hidden diversification shifts in models of trait-dependent speciation and extinction

Detecting hidden diversification shifts in models of trait-dependent speciation and extinction
Jeremy M Beaulieu , Brian C O’Meara
doi: http://dx.doi.org/10.1101/016386

The distribution of diversity can vary considerably from clade to clade. Attempts to understand these patterns often employ state speciation and extinction models to determine whether the evolution of a particular novel trait has increased speciation rates and/or decreased their extinction rates. It is still unclear, however, whether these models are uncovering important drivers of diversification, or whether they are simply pointing to more complex patterns involving many unmeasured and co-distributed factors. Here we describe an extension to the popular state speciation and extinction models that specifically accounts for the presence of unmeasured factors that could impact diversification rates estimated for the states of any observed trait. Specifically, our model, which we refer to as HiSSE (Hidden State Speciation and Extinction), assumes that related to each observed state in the model are “hidden” states that exhibit potentially distinct diversification dynamics and transition rates than the observed states in isolation. Under rigorous simulation tests and when applied to empirical data, we find that HiSSE performs reasonably well, and can at least detect net diversification rate differences between observed and hidden states. We also discuss the remaining issues with state speciation and extinction models in general, and the important ways in which HiSSE provides a more nuanced understanding of trait-dependent diversification.

PoMo: An Allele Frequency-based Approach for Species Tree Estimation

PoMo: An Allele Frequency-based Approach for Species Tree Estimation
Nicola De Maio , Dominik Schrempf , Carolin Kosiol
doi: http://dx.doi.org/10.1101/016360

Incomplete lineage sorting can cause incongruencies of the overall species-level phylogenetic tree with the phylogenetic trees for individual genes or genomic segments. If these incongruencies are not accounted for, it is possible to incur several biases in species tree estimation. Here, we present a simple maximum likelihood approach that accounts for ancestral variation and incomplete lineage sorting. We use a POlymorphisms-aware phylogenetic MOdel (PoMo) that we have recently shown to efficiently estimate mutation rates and fixation biases from within and between-species variation data. We extend this model to perform efficient estimation of species trees. We test the performance of PoMo in several different scenarios of incomplete lineage sorting using simulations and compare it with existing methods both in accuracy and computational speed. In contrast to other approaches, our model does not use coalescent theory but is allele-frequency based. We show that PoMo is well suited for genome-wide species tree estimation and that on such data it is more accurate than previous approaches.