Trait evolution in adaptive radiations: modelling and measuring interspecific competition on phylogenies

Posted on December 4, 2015 by schraib

Magnus Clarke, Gavin H Thomas, Robert P Freckleton

bioRxiv doi: http://dx.doi.org/10.1101/033647

The incorporation of ecological processes into models of trait evolution is important for understanding past drivers of evolutionary change. Species interactions have long been thought to be key drivers of trait evolution. However, models for comparative data that account for interactions between species are lacking. One of the challenges is that such models are intractable and difficult to express analytically. Here we present phylogenetic models of trait evolution that includes interspecific competition amongst species. Competition is modelled as a tendency of sympatric species to evolve towards distinct niches, producing trait overdispersion and high phylogenetic signal. The model predicts elevated trait variance across species and a slowdown in evolutionary rate both across the clade and within each branch. The model also predicts a reduction in correlation between otherwise correlated traits. We used an Approximate Bayesian Computation (ABC) approach to estimate model parameters. We tested the power of the model to detect deviations from Brownian trait evolution using simulations, finding reasonable power to detect competition in sufficiently large (20+ species) trees. We applied the model to examine the evolution of bill morphology of Darwin’s finches, and found evidence that competition affects the evolution of bill length.

A New Statistical Framework for Genetic Pleiotropic Analysis of High Dimensional Phenotype Data

Posted on December 4, 2015 by schraib

A New Statistical Framework for Genetic Pleiotropic Analysis of High Dimensional Phenotype Data
Panpan Wang, Mohammad Rahman, Li Jin, Momiao Xiong
(Submitted on 3 Dec 2015)

The widely used genetic pleiotropic analysis of multiple phenotypes are often designed for examining the relationship between common variants and a few phenotypes. They are not suited for both high dimensional phenotypes and high dimensional genotype (next-generation sequencing) data. To overcome these limitations, we develop sparse structural equation models (SEMs) as a general framework for a new paradigm of genetic analysis of multiple phenotypes. To incorporate both common and rare variants into the analysis, we extend the traditional multivariate SEMs to sparse functional SEMs. To deal with high dimensional phenotype and genotype data, we employ functional data analysis and the alternative direction methods of multiplier (ADMM) techniques to reduce data dimension and improve computational efficiency. Using large scale simulations we showed that the proposed methods have higher power to detect true causal genetic pleiotropic structure than other existing methods. Simulations also demonstrate that the gene-based pleiotropic analysis has higher power than the single variant-based pleiotropic analysis. The proposed method is applied to exome sequence data from the NHLBI Exome Sequencing Project (ESP) with 11 phenotypes, which identifies a network with 137 genes connected to 11 phenotypes and 341 edges. Among them, 114 genes showed pleiotropic genetic effects and 45 genes were reported to be associated with phenotypes in the analysis or other cardiovascular disease (CVD) related phenotypes in the literature.

Bayesian non-parametric inference for Λ-coalescents: consistency and a parametric method

Posted on December 4, 2015 by schraib

Bayesian non-parametric inference for Λ-coalescents: consistency and a parametric method
Jere Koskela, Paul A. Jenkins, Dario Spanò
(Submitted on 3 Dec 2015)

We investigate Bayesian non-parametric inference for Λ-coalescent processes parametrised by probability measures on the unit interval, and provide an implementable, provably consistent MCMC inference algorithm. We give verifiable criteria on the prior for posterior consistency when observations form a time series, and prove that any non-trivial prior is inconsistent when all observations are contemporaneous. We then show that the likelihood given a data set of size n∈ℕ is constant across Λ-measures whose leading n−2 moments agree, and focus on inferring truncated sequences of moments. We provide a large class of functionals which can be extremised using finite computation given a credibility region of posterior truncated moment sequences, and a pseudo-marginal Metropolis-Hastings algorithm for sampling the posterior. Finally, we compare the efficiency of the exact and noisy pseudo-marginal algorithms with and without delayed acceptance acceleration using a simulation study.

Efficient recycled algorithms for quantitative trait models on phylogenies

Posted on December 4, 2015 by schraib

Efficient recycled algorithms for quantitative trait models on phylogenies
Gordon Hiscott, Colin Fox, Matthew Parry, David Bryant
(Submitted on 2 Dec 2015)

We present an efficient and flexible method for computing likelihoods of phenotypic traits on a phylogeny. The method does not resort to Monte-Carlo computation but instead blends Felsenstein’s discrete character pruning algorithm with methods for numerical quadrature. It is not limited to Gaussian models and adapts readily to model uncertainty in the observed trait values. We demonstrate the framework by developing efficient algorithms for likelihood calculation and ancestral state reconstruction under Wright’s threshold model, applying our methods to a dataset of trait data for extrafloral nectaries (EFNs) across a phylogeny of 839 Labales species.

Author post: Efficient coalescent simulation and genealogical analysis for large sample sizes

Posted on December 4, 2015 by Joe Pickrell

This guest post is by Jerome Kelleher, on the preprint by Kelleher, Etheridge, and McVean titled “Efficient coalescent simulation and genealogical analysis for large sample sizes” available here from biorXiv.

In this post we summarise the main results of our recent bioRxiv preprint. We’ve left out a lot of important details here, but hopefully this summary will be enough to convince you that it’s worth reading the paper!

Coalescent simulation is a fundamental tool in modern population genetics, and a large number of packages exist to simulate various aspects of the model. The basic algorithm to simulate the coalescent with recombination was defined by Hudson in 1983, who also published the classical ms simulation program in 2002. Programs such as ms based on Hudson’s algorithm perform poorly for longer sequences, making simulation of chromosome sized regions under the influence of recombination impossible. The Sequentially Markov Coalescent (SMC) approximates the coalescent with recombination by assuming that each marginal genealogy depends only on its predecessor, making simulation much more efficient. The SMC can be a poor approximation when long range linkage information is important, however, and current simulations do not scale well in terms of sample size. Population-scale sequencing projects currently under way mean there is an urgent need for accurate simulations of hundreds of thousands of genomes.

We present a new formulation of Hudson’s simulation algorithm that solves these issues, making chromosome-scale simulation of the exact coalescent with recombination for hundreds of thousands of samples possible for the first time. Our approach begins by defining the genealogies that we are constructing in terms of integer vectors of a specific form, which we refer to as `sparse trees’. We generate recombination and common ancestor events in the same manner as the classical methods, but our approach to constructing marginal genealogies is quite different. When a coalescence within a marginal tree occurs, we store a tuple consisting of the left and right coordinates of the overlapping region, the parent and child nodes (which are integers), and the time at which the event occurred. We refer to these tuples as `coalescence records’, and they provide sufficient information to recover all genealogies after the simulation has completed. We implemented these ideas in a simulator called msprime, which we compared with the state of the art. For a fixed sample size of 1000 and increasing sequence length (with human-like recombination parameters), we found that msprime is much faster than comparable exact simulators and, surprisingly, is competitive with approximate SMC simulators. Even more surprisingly, we found that for a fixed sequence length of 50 megabases and increasing sample size, msprime was much faster than any existing simulator for large samples.

Storing the output of the simulations as coalescence records has major advantages. Because parent-child relationships shared by adjacent trees are stored only once, the correlation structure of the sequence of genealogies is explicit and the storage requirements minimised. To illustrate this, we ran a simulation of a 100 megabase chromosome with a roughly human recombination rate for a sample of 100,000 individuals. This simulation ran in about 6 minutes on a single CPU thread and used around 850MB of RAM. The resulting coalescence records required 88MB using msprime’s native HDF5 based storage format. Storing the same genealogies in Newick format requires around 3.5TB.

Highly compressed representations of data usually come at the cost of increased access time. In contrast, we can retrieve complete genealogical information from coalescence records many times faster than is possible using existing Newick-based methods. We provide a detailed listing of an algorithm to sequentially recover the marginal genealogies from a set of coalescence records and show that this algorithm requires constant time to transition between adjacent trees. This algorithm has been implemented as part of msprime’s Python API, and required around 3 seconds to iterate over all 1.1 million trees generated by the simulation above. We compared this performance to several popular tree processing libraries, and found that the fastest would require an estimated 38 days to parse the same set of trees in Newick format. Thus, in this example, by using msprime’s storage format and API we can store the same set of trees using around forty thousand times less space and parse them around a million times more quickly than Newick based methods.

We can also store mutational information in a natural and efficient way. If we have an infinite sites mutation that occurs on the parent branch of a particular node at a particular position on the sequence, then we simply store this (node, position) pair. This leads to a very compact representation of the combined genealogical and mutational state of a sample. We simulated 1.2 million infinite sites mutations on top of the genealogies generated earlier, which resulted in a 102MB HDF5 file containing the coalescence records and mutations. In contrast, the corresponding text haplotypes consumed 113GB of storage space. Associating mutations directly with tree nodes also allows us to perform some important calculations efficiently. We describe an efficient algorithm to count the total number of leaf nodes from a particular set below each node in the tree as we iterate over the sequence. This algorithm allows us to (for example) calculate allele frequencies within specific subsets in constant time. Many other applications of these techniques are possible.

The availability of faster and more accurate simulations may lead to interesting new applications, and so we conclude by discussing some potential applications of our work. Of particular interest is the possibility of inferring a set of coalescence records from real biological data, obtaining a compressed representation that can be efficiently processed. This is a very interesting and promising direction for future research.

Reticulate evolution is favoured in microbial niche switching.

Posted on December 3, 2015 by schraib

Reticulate evolution is favoured in microbial niche switching.

Eric J. Ma, Nichola J. Hill, Kyle Yuan, Justin Zabilansky, Jonathan A. Runstadler

bioRxiv doi: http://dx.doi.org/10.1101/033514

Reticulate evolution is thought to accelerate the process of evolution beyond simple genetic drift and selection, helping to rapidly generate novel hybrids with combinations of adaptive traits. However, the long-standing dogma that reticulate evolutionary processes are likewise advantageous for switching ecological niches, as in microbial pathogen host switch events, has not been explicitly tested. We use data from the influenza genome sequencing project and a phylogenetic heuristic approach to show that reassortment, a reticulate evolutionary mechanism, predominates over mutational drift in transmission between different host species. Moreover, as host evolutionary distance increases, reassortment is increasingly favored. We conclude that the greater the quantitative difference between ecological niches, the greater the importance of reticulate evolutionary processes in overcoming niche barriers.

Empirical evidence for heterozygote advantage in adapting diploid populations of Saccharomyces cerevisiae

Posted on December 3, 2015 by schraib

Empirical evidence for heterozygote advantage in adapting diploid populations of Saccharomyces cerevisiae

Diamantis Sellis, Daniel J Kvitek, Barbara Dunn, Gavin Sherlock, Dmitri A Petrov

bioRxiv doi: http://dx.doi.org/10.1101/033563

Adaptation in diploids is predicted to proceed via mutations that are at least partially dominant in fitness. Recently we argued that many adaptive mutations might also be commonly overdominant in fitness. Natural (directional) selection acting on overdominant mutations should drive them into the population but then, instead of bringing them to fixation, should maintain them as balanced polymorphisms via heterozygote advantage. If true, this would make adaptive evolution in sexual diploids differ drastically from that of haploids. Unfortunately, the validity of this prediction has not yet been tested experimentally. Here we performed 4 replicate evolutionary experiments with diploid yeast populations (Saccharomyces cerevisiae) growing in glucose-limited continuous cultures. We sequenced 24 evolved clones and identified initial adaptive mutations in all four chemostats. The first adaptive mutations in all four chemostats were three CNVs, all of which proved to be overdominant in fitness. The fact that fitness overdominant mutations were always the first step in independent adaptive walks strongly supports the prediction that heterozygote advantage can arise as a common outcome of directional selection in diploids and demonstrates that overdominance of de novo adaptive mutations in diploids is not rare.

Accelerated DNA evolution in rats is driven by differential methylation in sperm

Posted on December 3, 2015 by schraib

Accelerated DNA evolution in rats is driven by differential methylation in sperm

Xiao-Hui Liu, Jin-Min Lian, Fei Ling, Ning Li, Da-Wei Wang, Ying Song, Qi-Ye Li, Ya-Bin Jin, Zhi-Yong Feng, Lin Cong, Dan-Dan Yao, Jing-Jing Sui

bioRxiv doi: http://dx.doi.org/10.1101/033571

Lamarckian inheritance has been largely discredited until the recent discovery of transgenerational epigenetic inheritance. However, transgenerational epigenetic inheritance is still under debate for unable to rule out DNA sequence changes as the underlying cause for heritability. Here, through profiling of the sperm methylomes and genomes of two recently diverged rat subspecies, we analyzed the relationship between epigenetic variation and DNA variation, and their relative contribution to evolution of species. We found that only epigenetic markers located in differentially methylated regions (DMRs) between subspecies, but not within subspecies, can be stably and effectively passed through generations. DMRs in response to both random and stable environmental difference show increased nucleotide diversity, and we demonstrated that it is variance of methylation level but not deamination caused by methylation driving increasing of nucleotide diversity in DMRs, indicating strong relationship between environment-associated changes of chromatin accessibility and increased nucleotide diversity. Further, we detected that accelerated fixation of DNA variants occur only in inter-subspecies DMRs in response to stable environmental difference but not intra-subspecies DMRs in response to random environmental difference or non-DMRs, indicating that this process is possibly driven by environment-associated fixation of divergent methylation status. Our results thus establish a bridge between Lamarckian inheritance and Darwinian selection.

Heritability of Neuroanatomical Shape

Posted on December 2, 2015 by schraib

Heritability of Neuroanatomical Shape

Tian Ge, Martin Reuter, Anderson M. Winkler, Avram J. Holmes, Phil H. Lee, Lee S. Tirrell, Joshua L. Roffman, Randy L. Buckner, Jordan W. Smoller, Mert R. Sabuncu

bioRxiv doi: http://dx.doi.org/10.1101/033407

Measurements from structural brain magnetic resonance imaging (MRI) scans have been increasingly analyzed as intermediate phenotypes to bridge the gap between clinical features and genetic variation. To date, most imaging phenotypes are scalar, such as the volume of a brain region, which can miss subtle or localized morphological variation associated with genetics or relevant to disease. Neuroanatomical shape measurements – multidimensional geometric descriptions of a brain structure – provide an alternate class of phenotypes that remain largely unexplored. In this paper, we extend the concept of heritability to multidimensional traits, and present the first comprehensive analysis of the heritability of neuroanatomical shape measurements across an ensemble of brain structures based on genome-wide single nucleotide polymorphism (SNP) and MRI data from 1,317 unrelated, young (18-35 years) and healthy individuals. Our results demonstrate that neuroanatomical shape can be significantly heritable, above and beyond volume, and thus can serve as a complementary phenotype to study the genetic underpinnings and clinical relevance of brain structure.

Accurate genetic profiling of anthropometric traits using a big data approach

Posted on December 2, 2015 by schraib

Accurate genetic profiling of anthropometric traits using a big data approach

Oriol Canela-Xandri, Konrad Rawlik, John A. Woolliams, Albert Tenesa

bioRxiv doi: http://dx.doi.org/10.1101/033134

Genome-wide association studies (GWAS) promised to translate their findings into clinically beneficial improvements of patient management by tailoring disease management to the individual through the prediction of disease risk. However, the ability to translate genetic findings from GWAS into predictive tools that are of clinical utility and which may inform clinical practice has, so far, been encouraging but limited. Here we propose to use a more powerful statistical approach that enables the prediction of multiple medically relevant phenotypes without the costs associated with developing a genetic test for each of them. As a proof of principle, we used a common panel of 319,038 SNPs to train the prediction models in 114,264 unrelated White-British for height and four obesity related traits (body mass index, basal metabolic rate, body fat percentage, and waist-to-hip ratio). We obtained prediction accuracies that ranged between 46% and 75% of the maximum achievable given their explained heritable component. This represents an improvement of up to 75% over the phenotypic variance explained by the predictors developed through large collaborations, which used more than twice as many training samples. Across-population predictions in White non-British individuals were similar to those of White-British whilst those in Asian and Black individuals were informative but less accurate. The genotyping of circa 500,000 UK Biobank participants will yield predictions ranging between 66% and 83% of the maximum. We anticipate that our models and a common panel of genetic markers, which can be used across multiple traits and diseases, will be the starting point to tailor disease management to the individual. Ultimately, we will be able to capitalise on whole-genome sequence and environmental risk factors to realise the full potential of genomic medicine.

Haldane's Sieve

Discussing preprints in population and evolutionary genetics

Monthly Archives: December 2015

Trait evolution in adaptive radiations: modelling and measuring interspecific competition on phylogenies

A New Statistical Framework for Genetic Pleiotropic Analysis of High Dimensional Phenotype Data

Bayesian non-parametric inference for Λ-coalescents: consistency and a parametric method

Efficient recycled algorithms for quantitative trait models on phylogenies

Author post: Efficient coalescent simulation and genealogical analysis for large sample sizes

Reticulate evolution is favoured in microbial niche switching.

Empirical evidence for heterozygote advantage in adapting diploid populations of Saccharomyces cerevisiae

Accelerated DNA evolution in rats is driven by differential methylation in sperm

Heritability of Neuroanatomical Shape

Accurate genetic profiling of anthropometric traits using a big data approach

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: