An integrative statistical model for inferring strain admixture within clinical Plasmodium falciparum isolates

An integrative statistical model for inferring strain admixture within clinical Plasmodium falciparum isolates

John D. O’Brien, Zamin Iqbal, Lucas Amenga-Etego
(Submitted on 29 May 2015)

Since the arrival of genetic typing methods in the late 1960’s, researchers have puzzled at the clinical consequence of observed strain mixtures within clinical isolates of Plasmodium falciparum. We present a new statistical model that infers the number of strains present and the amount of admixture with the local population (panmixia) using whole-genome sequence data. The model provides a rigorous statistical approach to inferring these quantities as well as the proportions of the strains within each sample. Applied to 168 samples of whole-genome sequence data from northern Ghana, the model provides significantly improvement fit over models implementing simpler approaches to mixture for a large majority (129/168) of samples. We discuss the possible uses of this model as a window into within-host selection for clinical and epidemiological studies and outline possible means for experimental validation.

The effect of non-reversibility on inferring rooted phylogenies

The effect of non-reversibility on inferring rooted phylogenies

S. Cherlin, T. M. W. Nye, R. J. Boys, S. E. Heaps, T. A. Williams, T. M. Embley
(Submitted on 29 May 2015)

Most phylogenetic models assume that the evolutionary process is stationary and reversible. As a result, the root of the tree cannot be inferred as part of the analysis because the likelihood of the data does not depend on the position of the root. Yet defining the root of a phylogenetic tree is a key component of phylogenetic inference because it provides a point of reference for polarising ancestor/descendant relationships and therefore interpreting the tree. In this paper we investigate the effect of relaxing the reversibility assumption and allowing the position of the root to be another unknown quantity in the model. We propose two hierarchical models which are centred on a reversible model but perturbed to allow non-reversibility. The models differ in the degree of structure imposed on the perturbations. The analysis is performed in the Bayesian framework using Markov chain Monte Carlo methods. We illustrate the performance of the two non-reversible models in analyses of simulated datasets using two types of topological priors. We then apply the models to a real biological dataset, the radiation of polyploid yeasts, for which there is a robust biological opinion about the root position. Finally we apply the models to a second biological dataset for which the rooted tree is controversial: the ribosomal tree of life. We compare the two non-reversible models and conclude that both are useful in inferring the position of the root from real biological datasets.

A Bayesian Approach for Detecting Mass-Extinction Events When Rates of Lineage Diversification Vary

A Bayesian Approach for Detecting Mass-Extinction Events When Rates of Lineage Diversification Vary

Michael R. May, Sebastian Höhna, Brian R. Moore
doi: http://dx.doi.org/10.1101/020149

The paleontological record chronicles numerous episodes of mass extinction that severely culled the Tree of Life. Biologists have long sought to assess the extent to which these events may have impacted particular groups. We present a novel method for detecting mass-extinction events from phylogenies estimated from molecular sequence data. We develop our approach in a Bayesian statistical framework, which enables us to harness prior information on the frequency and magnitude of mass-extinction events. The approach is based on an episodic stochastic-branching process model in which rates of speciation and extinction are constant between rate-shift events. We model three types of events: (1) instantaneous tree-wide shifts in speciation rate; (2) instantaneous tree-wide shifts in extinction rate, and; (3) instantaneous tree-wide mass-extinction events. Each of the events is described by a separate compound Poisson process (CPP) model, where the waiting times between each event are exponentially distributed with event-specific rate parameters. The magnitude of each event is drawn from an event-type specific prior distribution. Parameters of the model are then estimated using a reversible-jump Markov chain Monte Carlo (rjMCMC) algorithm. We demonstrate via simulation that this method has substantial power to detect the number of mass-extinction events, provides unbiased estimates of the timing of mass-extinction events, while exhibiting an appropriate (i.e., below 5%) false discovery rate even in the case of background diversification rate variation. Finally, we provide an empirical application of this approach to conifers, which reveals that this group has experienced two major episodes of mass extinction. This new approach—the CPP on Mass Extinction Times (CoMET) model—provides an effective tool for identifying mass-extinction events from molecular phylogenies, even when the history of those groups includes more prosaic temporal variation in diversification rate.

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility
Michael DeGiorgio, Christian D. Huber, Melissa J. Hubisz, Ines Hellmann, Rasmus Nielsen
Subjects: Populations and Evolution (q-bio.PE)

SweepFinder is a popular program that implements a powerful likelihood-based method for detecting recent positive selection, or selective sweeps. Here, we present SweepFinder2, an extension of SweepFinder with increased sensitivity and robustness to the confounding effects of mutation rate variation and background selection, as well as increased flexibility that enables the user to examine genomic regions in greater detail and to specify a fixed distance between test sites. Moreover, SweepFinder2 enables the use of invariant sites for sweep detection, increasing both its power and precision relative to SweepFinder.

Detection and interpretation of shared genetic influences on 40 human traits

Detection and interpretation of shared genetic influences on 40 human traits
Joseph Pickrell, Tomaz Berisa, Laure Segurel, Joyce Y Tung, David Hinds
doi: http://dx.doi.org/10.1101/019885

We performed a genome-wide scan for genetic variants that influence multiple human phenotypes by comparing large genome-wide association studies (GWAS) of 40 traits or diseases, including anthropometric traits (e.g. nose size and male pattern baldness), immune traits (e.g. susceptibility to childhood ear infections and Crohn’s disease), metabolic phenotypes (e.g. type 2 diabetes and lipid levels), and psychiatric diseases (e.g. schizophrenia and Parkinson’s disease). First, we identified 307 loci (at a false discovery rate of 10%) that influence multiple traits (excluding “trivial” phenotype pairs like type 2 diabetes and fasting glucose). Several loci influence a large number of phenotypes; for example, variants near the blood group gene ABO influence eleven of these traits, including risk of childhood ear infections (rs635634: log-odds ratio = 0.06, P = 1.4 × 10−8) and allergies (log-odds ratio = 0.05, P = 2.5 × 10−8), among others. Similarly, a nonsynonymous variant in the zinc transporter SLC39A8 influences seven of these traits, including risk of schizophrenia (rs13107325: log-odds ratio = 0.15, P = 2 × 10−12) and Parkinson’s disease (log-odds ratio = -0.15, P = 1.6 × 10−7), among others. Second, we used these loci to identify traits that share multiple genetic causes in common. For example, genetic variants that delay age of menarche in women also, on average, delay age of voice drop in men, decrease body mass index (BMI), increase adult height, and decrease risk of male pattern baldness. Finally, we identified four pairs of traits that show evidence of a causal relationship. For example, we show evidence that increased BMI causally increases triglyceride levels, and that increased liability to hypothyroidism causally decreases adult height.

General formulation of Luria-Delbrück distribution of the number of mutants

General formulation of Luria-Delbrück distribution of the number of mutants
bahram houchmandzadeh
doi: http://dx.doi.org/10.1101/019869

Abstract The Luria-Delbrück experiment is a cornerstone of evolutionary theory, demonstrating the randomness of mutations before selection. The distribution of the number of mutants in this experiment has been the subject of intense investigation during the last 70 years. Despite this considerable effort, most of the results have been obtained under the assumption of constant growth rate, which is far from the experimental condition. We derive here the properties of this distribution for arbitrary growth function, for both the deterministic and stochastic growth of the mutants. The derivation we propose is surprisingly simple and versatile, allowing many generalizations to be taken easily into account.

Stable eusociality via maternal manipulation when resistance is costless

Stable eusociality via maternal manipulation when resistance is costless
Mauricio González-Forero
doi: http://dx.doi.org/10.1101/019877

In many eusocial species, workers develop or maintain their non-reproductive condition following maternal influence through aggression, differential feeding, or pheromones. This observation has suggested that eusociality may evolve from maternal manipulation where the mother induces offspring to take worker roles against their inclusive fitness interests. If manipulation is executed via aggression or poor feeding, offspring resistance to manipulation could be costly enough to be disfavored, allowing eusociality via manipulation to be evolutionarily stable. However, if manipulation is executed via pheromones, resistance could be less costly, in principle leading to evolutionarily unstable eusociality. Here I show that maternal manipulation can generate evolutionarily stable eusociality even if resistance has no direct costs provided that maternally neglected offspring use help more efficiently than maternally provisioned offspring (e.g., to regain survival). Manipulation temporarily creates ineffectively resisting helpers that allow the mother to reduce maternal care toward helped offspring. If maternally neglected offspring use help more efficiently, maternal care reduction produces offspring that benefit more from the ineffectively resisting helpers. Thus, maternal care reduction increases the average benefit received by helped offspring, bringing Hamilton’s rule to satisfaction and eliminating selection for resistance. Manipulation can then generate stable eusociality under smaller benefit-cost ratios than when manipulation is absent although resistance is costless. These results predict that eusociality where ignoring maternal influence is rather costless is likely to have originated from maternal manipulation if (1) maternally neglected offspring are highly efficient help users and (2) maternally provisioned offspring can only moderately increase their survival by being helped.

Origins of major archaeal clades do not correspond to gene acquisitions from bacteria

Origins of major archaeal clades do not correspond to gene acquisitions from bacteria
Mathieu Groussin, Bastien Boussau, Gergely Szöllősi, Laura Eme, Manolo Gouy, Céline Brochier-Armanet, Vincent Daubin
doi: http://dx.doi.org/10.1101/019851

In a recent article, Nelson-Sathi et al. [NS] report that the origins of Major Archaeal Lineages [MAL] correspond to massive group-specific gene acquisitions via horizontal gene transfer (HGT) from bacteria (Nelson-Sathi et al., 2015, Nature 517(7532):77-80). If correct, this would have fundamental implications for the process of diversification in microbes. However, a re-examination of these data and results shows that the methodology used by NS systematically inflates the number of genes acquired at the root of each MAL, and incorrectly assumes bacterial origins for these genes. A re-analysis of their data with appropriate phylogenetic models accounting for the dynamics of gene gain and loss between lineages supports the continuous acquisition of genes over long periods in the evolution of Archaea.

Chromosomal rearrangements as barriers to genetic homogenization between archaic and modern humans

Chromosomal rearrangements as barriers to genetic homogenization between archaic and modern humans

Rebekah L. Rogers
(Submitted on 26 May 2015)

Chromosomal rearrangements, which shuffle DNA across the genome, are an important source of divergence across taxa that can modify gene expression and function. Using a paired-end read approach with Illumina sequence data for archaic humans, I identify changes in genome structure that occurred recently in human evolution. Hundreds of rearrangements indicate genomic trafficking between the sex chromosomes and autosomes, raising the possibility of sex-specific changes. Additionally, genes adjacent to genome structure changes in Neanderthals are associated with testis-specific expression, consistent with evolutionary theory that new genes commonly form with expression in the testes. I identify one case of new-gene creation through transposition from the Y chromosome to chromosome 10 that combines the 5′ end of the testis-specific gene Fank1 with previously untranscribed sequence. This new transcript experienced copy number expansion in archaic genomes, indicating rapid genomic change. Finally, loci containing genome structure changes show diminished rates of introgression from Neanderthals into modern humans, consistent with the hypothesis that rearrangements serve as barriers to gene flow during hybridization. Together, these results suggest that this previously unidentified source of genomic variation has important biological consequences in human evolution.

A Unified Architecture of Transcriptional Regulatory Elements

A Unified Architecture of Transcriptional Regulatory Elements

Robin Andersson, Albin Sandelin, Charles G Danko
doi: http://dx.doi.org/10.1101/019844

Gene expression is precisely controlled in time and space through the integration of signals that act at gene promoters and gene-distal enhancers. Classically, promoters and enhancers are considered separate classes of regulatory elements, often distinguished by histone modifications. However, recent studies have revealed broad similarities between enhancers and promoters, blurring the distinction: active enhancers often initiate transcription, and some gene promoters have the potential of enhancing transcriptional output of other promoters. Here, we propose a model in which promoters and enhancers are considered a single class of functional element, with a unified architecture for transcription initiation. The context of interacting regulatory elements, and surrounding sequences, determine local transcriptional output as well as the enhancer and promoter activities of individual elements.