Comparing Evolutionary Rates Using An Exact Test for 2×2 Tables with Continuous Cell Entries

Comparing Evolutionary Rates Using An Exact Test for 2×2 Tables with Continuous Cell Entries

A. Morgan Thompson, M. Cyrus Maher, Lawrence H. Uricchio, Zachary A. Szpiech, Ryan D. Hernandez
(Submitted on 11 Apr 2014)

Assessing the statistical significance of an observed 2×2 contingency table can easily be accomplished using Fisher’s exact test (FET). However, if the cell entries are continuous or represent values inferred from a continuous parametric model, then FET cannot be applied. Such tables arise frequently in areas of biostatistical research including population genetics and evolutionary genomics, where cell entries are estimated by computational methods and result in cell entries drawn from the non-negative real line R+. Simply rounding cell entries to conform to the assumptions of FET is an ill-suited approach that we show creates problems related to both type-I and type-II errors. Pearson’s chi^2 test for independence, while technically applicable, is not often effective for these tables, as the test has several limiting assumptions that make application of this method inadvisable in many common instances (particularly with small cell entries). Here we develop a novel method for tables with continuous entries, which we term continuous Fisher’s Exact Test (cFET). Through simulations, we show that cFET has a close-to-uniform distribution of p-values under the null hypothesis of independence, and more power when applied to tables where the null hypothesis is false (compared to FET applied to rounded cell entries). We apply cFET to an example from comparative genomics to confirm an overall increased evolutionary rate among primates compared to rodents, and identify several genes that show particularly elevated evolutionary rates in primates. Some of these genes exhibit signatures of continued positive selection along the human lineage since our divergence with chimpanzee 5-7 million years ago, as well as ongoing selection in modern humans.

Selection signatures in worldwide Sheep populations

Selection signatures in worldwide Sheep populations

Maria-Ines Fariello, Bertrand Servin, Gwenola Tosser-Klopp, Rachelle Rupp, Carole Moreno, International Sheep Genomics Consortium n.a., Magali San Cristobal, simon boitard

The diversity of populations in domestic species offers great opportunities to study genome response to selection. The recently published Sheep HapMap dataset is a great example of characterization of the world wide genetic diversity in sheep. In this study, we re-analyzed the Sheep HapMap dataset to identify selection signatures in worldwide sheep populations. Compared to previous analyses, we made use of statistical methods that (i) take account of the hierarchical structure of sheep populations, (ii) make use of linkage disequilibrium information and (iii) focus specifically on either recent or older selection signatures. We show that this allows pinpointing several new selection signatures in the sheep genome and distinguishing those related to modern breeding objectives and to earlier post-domestication constraints. The newly identified regions, together with the ones previously identified, reveal the extensive genome response to selection on morphology, color and adaptation to new environments.

Natural CMT2 variation is associated with genome-wide methylation changes and temperature adaptation

Natural CMT2 variation is associated with genome-wide methylation changes and temperature adaptation

Xia Shen, Jennifer De Jonge, Simon Forsberg, Mats Pettersson, Zheya Sheng, Lars Hennig, Örjan Carlborg

As Arabidopsis thaliana has colonized a wide range of habitats across the world it is an attractive model for studying the genetic mechanisms underlying environmental adaptation. Here, we used public data from two collections of A. thaliana accessions to associate genetic variability at individual loci with differences in climates at the sampling sites. We use a novel method to screen the genome for plastic alleles that tolerate a broader climate range than the major allele. This approach reduces confounding with population structure and increases power compared to standard genome-wide association methods. Sixteen novel loci were found, including an association between Chromomethylase 2 (CMT2) and variability in seasonal temperatures where the plastic allele had reduced genome-wide CHH methylation. Cmt2 mutants were more tolerant to heat-stress, suggesting genetic regulation of epigenetic modifications as a likely mechanism underlying natural adaptation to variable temperatures, potentially through differential allelic plasticity to temperature- stress.

Majority rule has transition ratio 4 on Yule trees under a 2-state symmetric model

Majority rule has transition ratio 4 on Yule trees under a 2-state symmetric model

Elchanan Mossel, Mike Steel
(Submitted on 10 Apr 2014)

Inferring the ancestral state at the root of a phylogenetic tree from states observed at the leaves is a problem arising in evolutionary biology. The simplest technique — majority rule — estimates the root state by the most frequently occurring state at the leaves. Alternative methods — such as maximum parsimony – explicitly take the tree structure into account. Since either method can outperform the other on particular trees, it is useful to consider the accuracy of the methods on trees generated under some evolutionary null model, such as a Yule pure-birth model. In this short note, we answer a recently posed question concerning the performance of majority rule on Yule trees under a symmetric 2-state Markovian substitution model of character state change. We show that majority rule is accurate precisely when the ratio of the birth (speciation) rate of the Yule process to the substitution rate exceeds the value 4. By contrast, maximum parsimony has been shown to be accurate only when this ratio is at least 6. Our proof relies on a second moment calculation, coupling, and a novel application of a reflection principle.

The relationships among GC content, nucleosome occupancy, and exon size

The relationships among GC content, nucleosome occupancy, and exon size

Liya Wang, Lincoln Stein, Doreen Ware
(Submitted on 9 Apr 2014)

The average size of internal translated exons, ranging from 120 to 165 nt across metazoans, is approximately the size of the typical mononucleosome (147 nt). Genome-wide study has also shown that nucleosome occupancy is significantly higher in exons than in introns, which might indicate that the evolution of exon size is related to its nucleosome occupancy. By grouping exons by the GC contents of their flanking introns, we show that the average exon size is positively correlated with its GC content. Using the sequencing data from direct mapping of Homo sapiens nucleosomes with limited nuclease digestion, we show that the level of nucleosome occupancy is also positively correlated with the exon GC content in a similar fashion. We then demonstrated that exon size is positively correlated with their nucleosome occupancy. The strong correlation between exon size and the nucleosome occupancy suggests that chromatin organization may be related to the evolution of exon sizes.

Estimating Phylogeny from microRNA Data: A Critical Appraisal

Estimating Phylogeny from microRNA Data: A Critical Appraisal

Robert Thomson, David Plachetzki, Luke Mahler, Brian Moore

As progress toward a highly resolved tree of life continues to expose nodes that resist resolution, interest in new sources of phylogenetic information that are informative for these most difficult relationships continues to increase. One such potential source of information, the presence and absence of microRNA families, has been vigorously promoted as an ideal phylogenetic marker and has been recently deployed to resolve several long-standing phylogenetic questions. Understanding the utility of such markers for phylogenetic inference hinges on developing a better understanding for how such markers behave under suitable evolutionary models, as well as how they perform in real inference scenarios. However, as yet, no study has rigorously characterized the statistical behavior or utility of these markers. Here we examine the behavior and performance of microRNA presence/absence data under a variety of evolutionary models and reexamine datasets from several previous studies. We find that highly heterogeneous rates of microRNA gain and loss, pervasive secondary loss, and sampling error collectively render microRNA-based inference of phylogeny difficult, and fundamentally alter the conclusions for four of the five studies that we re-examine. Our results indicate that miRNA data have far less phylogenetic utility in resolving the tree of life than is currently recognized and we urge ample caution in their interpretation.

Bias and measurement error in comparative analyses: a case study with the Ornstein Uhlenbeck model

Bias and measurement error in comparative analyses: a case study with the Ornstein Uhlenbeck model

Gavin Huw Thomas, Natalie Cooper, Chris Venditti, Andrew Meade, Robert P Freckleton

Phylogenetic comparative methods are increasingly used to give new insight into variation, causes and consequences of trait variation among species. The foundation of these methods is a suite of models that attempt to capture evolutionary patterns by extending the Brownian constant variance model. However, the parameters of these models have been hypothesised to be biased and only asymptotically behave in a statistically predictable way as datasets become large. This does not seem to be widely appreciated. We show that a commonly used model in evolutionary biology (the Ornstein-Uhlenbeck model) is biased over a wide range of conditions. Many studies fitting this model use datasets that are small and prone to substantial biases. Our results suggest that simulating fitted models and comparing with empirical results is critical when fitting OU and other extensions of the Brownian model.

Model adequacy and the macroevolution of angiosperm functional traits

Model adequacy and the macroevolution of angiosperm functional traits
Matthew Pennell, Richard G FitzJohn, William K Cornwell, Luke J Harmon

All models are wrong and sometimes even the best of a set of models is useless. Modern phylogenetic comparative methods (PCMs) are almost exclusively model–based and therefore making robust inferences from PCMs requires using a model of trait evolution that is a good explanation for the data. To date, researchers using PCMs have evaluated the explanatory power of a model only in terms of relative, not absolute, fit. Here we develop a general statistical framework for assessing the absolute fit, or adequacy, of phylogenetic models for the evolution of quantitative traits. We use our approach to test whether commonly used models are adequate descriptors of the macroevolutionary dynamics of real comparative data. We fit models of trait evolution to 337 comparative datasets covering three key Angiosperm functional traits and evaluated the absolute fit of the models to each dataset. Overall, the models we used are very inadequate for the evolution of these traits; this was true for many different groups and at many different scales. Furthermore, the relative support for a model had very little to do with its absolute adequacy. We argue that assessing model adequacy should be a key step in comparative analyses.

Sharing of Very Short IBD Segments between Humans, Neandertals, and Denisovans

Sharing of Very Short IBD Segments between Humans, Neandertals, and Denisovans

Gundula Povysil, Sepp Hochreiter

We analyze the sharing of very short identity by descent (IBD) segments between humans, Neandertals, and Denisovans to gain new insights into their demographic history. Short IBD segments convey information about events far back in time because the shorter IBD segments are, the older they are assumed to be. The identification of short IBD segments becomes possible through next generation sequencing (NGS), which offers high variant density and reports variants of all frequencies. However, only recently HapFABIA has been proposed as the first method for detecting very short IBD segments in NGS data. HapFABIA utilizes rare variants to identify IBD segments with a low false discovery rate. We applied HapFABIA to the 1000 Genomes Project whole genome sequencing data to identify IBD segments which are shared within and between populations. Some IBD segments are shared with the reconstructed ancestral genome of humans and other primates. These segments are tagged by rare variants, consequently some rare variants have to be very old. Other IBD segments are also old since they are shared with Neandertals or Denisovans, which explains their shorter lengths compared to segments that are not shared with these ancient genomes. The Denisova genome most prominently matched IBD segments that are shared by Asians. Many of these segments were found exclusively in Asians and they are longer than segments shared between other continental populations and the Denisova genome. Therefore, we could confirm an introgression from Deniosvans into ancestors of Asians after their migration out of Africa. While Neandertal-matching IBD segments are most often shared by Asians, Europeans share a considerably higher percentage of IBD segments with Neandertals compared to other populations, too. Again, many of these Neandertal-matching IBD segments are found exclusively in Asians, whereas Neandertal-matching IBD segments that are shared by Europeans are often found in other populations, too. Neandertal-matching IBD segments that are shared by Asians or Europeans are longer than those observed in Africans. This hints at a gene flow from Neandertals into ancestors of Asians and Europeans after they left Africa. Interestingly, many Neandertal- or Denisova-matching IBD segments are predominantly observed in Africans – some of them even exclusively. IBD segments shared between Africans and Neandertals or Denisovans are strikingly short, therefore we assume that they are very old. This may indicate that these segments stem from ancestors of humans, Neandertals, and Denisovans and have survived in Africans.

Intermediate Migration Yields Optimal Adaptation in Structured, Asexual Populations

Intermediate Migration Yields Optimal Adaptation in Structured, Asexual Populations

Arthur Covert III, Claus O Wilke

Most evolving populations are subdivided into multiple subpopulations connected to each other by varying levels of gene flow. However, how population structure and gene flow (i.e., migration) affect adaptive evolution is not well understood. Here, we studied the impact of migration on asexually reproducing evolving computer programs (digital organisms). We found that digital organisms evolve the highest fitness values at intermediate migration rates, and we tested three hypotheses that could potentially explain this observation: (i) migration promotes passage through fitness valleys, (ii) migration increases genetic variation, and (iii) migration reduces clonal interference through a process called “leapfrogging”. We found that migration had no appreciable effect on the number of fitness valleys crossed and that genetic variation declined monotonously with increasing migration rates, instead of peaking at the optimal migration rate. However, the number of leapfrogging events, in which a superior beneficial mutation emerges on a genetic background that predates the previously best genotype in the population, did peak at the optimal migration rate. We thus conclude that in structured, asexual populations intermediate migration rates allow for optimal exploration of multiple, distinct fitness peaks, and thus yield the highest long-term adaptive success.