Gene Ontology: Pitfalls, Biases, Remedies

Gene Ontology: Pitfalls, Biases, Remedies
Pascale Gaudet, Christophe Dessimoz

The Gene Ontology (GO) is a formidable resource but there are several considerations about it that are essential to understand the data and interpret it correctly. The GO is sufficiently simple that it can be used without deep understanding of its structure or how it is developed, which is both a strength and a weakness. In this chapter, we discuss some common misinterpretations of the ontology and the annotations. A better understanding of the pitfalls and the biases in the GO should help users make the most of this very rich resource. We also review some of the misconceptions and misleading assumptions commonly made about GO, including the effect of data incompleteness, the importance of annotation qualifiers, and the transitivity or lack thereof associated with different ontology relations. We also discuss several biases that can confound aggregate analyses such as gene enrichment analyses. For each of these pitfalls and biases, we suggest remedies and best practices.

Primer on the Gene Ontology

Primer on the Gene Ontology
Pascale Gaudet, Nives Škunca, James C. Hu, Christophe Dessimoz

The Gene Ontology (GO) project is the largest resource for cataloguing gene function. The combination of solid conceptual underpinnings and a practical set of features have made the GO a widely adopted resource in the research community and an essential resource for data analysis. In this chapter, we provide a concise primer for all users of the GO. We briefly introduce the structure of the ontology and explain how to interpret annotations associated with the GO.

On Determining if Tree-based Networks Contain Fixed Trees

On Determining if Tree-based Networks Contain Fixed Trees
Maria Anaya, Olga Anipchenko-Ulaj, Aisha Ashfaq, Joyce Chiu, Mahedi Kaiser, Max Shoji Ohsawa, Megan Owen, Ella Pavlechko, Katherine St. John, Shivam Suleria, Keith Thompson, Corrine Yap

We address an open question of Francis and Steel about phylogenetic networks and trees. They give a polynomial time algorithm to decide if a phylogenetic network, N, is tree-based and pose the problem: given a fixed tree T and network N, is N based on T? We show that it is NP-hard to decide, by reduction from 3-Dimensional Matching (3DM), and further, that the problem is fixed parameter tractable.

Translational plasticity facilitates the accumulation of nonsense genetic variants in the human population

Translational plasticity facilitates the accumulation of nonsense genetic variants in the human population

Sujatha Jagannathan, Robert K. Bradley

An Extended Maximum Likelihood Inference of Geographic Range Evolution by Dispersal, Local Extinction and Cladogenesis

An Extended Maximum Likelihood Inference of Geographic Range Evolution by Dispersal, Local Extinction and Cladogenesis

Champak Beeravolu Reddy, Fabien Condamine

Evolutionary dynamics of selfish DNA generates pseudo-linguistic features of genomes

Evolutionary dynamics of selfish DNA generates pseudo-linguistic features of genomes
Michael Sheinman, Anna Ramisch, Florian Massip, Peter F. Arndt
(Submitted on 4 Feb 2016)

Since the sequencing of large genomes, many statistical features of their sequences have been found. One intriguing feature is that certain subsequences are much more abundant than others. In fact, abundances of subsequences of a given length are distributed with a scale-free power-law tail, resembling properties of human texts, such as the Zipf’s law. Despite recent efforts, the understanding of this phenomenon is still lacking. Here we find that selfish DNA elements, such as those belonging to the Alu family of repeats, dominate the power-law tail. Interestingly, for the Alu elements the power-law exponent increases with the length of the considered subsequences. Motivated by these observations, we develop a model of selfish DNA expansion. The predictions of this model qualitatively and quantitatively agree with the empirical observations. This allows us to estimate parameters for the process of selfish DNA spreading in a genome during its evolution. The obtained results shed light on how evolution of selfish DNA elements shapes non-trivial statistical properties of genomes.

PoPoolationTE2: comparative population genomics of transposable elements using Pool-Seq

PoPoolationTE2: comparative population genomics of transposable elements using Pool-Seq

Robert Kofler, Daniel Gomez-Sanchez, Christian Schloetterer

Fine-scale human population structure in southern Africa reflects ecological boundaries

Fine-scale human population structure in southern Africa reflects ecological boundaries

Caitlin Uren, Minju Kim, Alicia R Martin, Dean Bobo, Christopher R Gignoux, Paul D van Helden, Marlo Moller, Eileen G Hoal, Brenna M Henn

The stasis that wasn’t: Adaptive evolution goes against phenotypic selection in a wild rodent population

The stasis that wasn’t: Adaptive evolution goes against phenotypic selection in a wild rodent population

Timothée Bonnet, Peter Wandeler, Glauco Camenisch, Erik Postma

Bayesian Node Dating based on Probabilities of Fossil Sampling Supports Trans-Atlantic Dispersal of Cichlid Fishes

Bayesian Node Dating based on Probabilities of Fossil Sampling Supports Trans-Atlantic Dispersal of Cichlid Fishes

Michael Matschiner, Zuzana Musilová, Julia M I Barth, Zuzana Starostová, Walter Salzburger, Mike Steel, Remco Bouckaert