Bayesian priors for tree calibration: Evaluating two new approaches based on fossil intervals

Bayesian priors for tree calibration: Evaluating two new approaches based on fossil intervals
Ryan W Norris, Cory L Strope, David M McCandlish, Arlin Stoltzfus
doi: http://dx.doi.org/10.1101/014340

Background: Studies of diversification and trait evolution increasingly rely on combining molecular sequences and fossil dates to infer time-calibrated phylogenetic trees. Available calibration software provides many options for the shape of the prior probability distribution of ages at a node to be calibrated, but the question of how to assign a Bayesian prior from limited fossil data remains open. Results: We introduce two new methods for generating priors based upon (1) the interval between the two oldest fossils in a clade, i.e., the penultimate gap (PenG), and (2) the ghost lineage length (GLin), defined as the difference between the oldest fossils for each of two sister lineages. We show that PenG and GLin/2 are point estimates of the interval between the oldest fossil and the true age for the node. Furthermore, given either of these quantities, we derive a principled prior distribution for the true age. This prior is log-logistic, and can be implemented approximately in existing software. Using simulated data, we test these new methods against some other approaches. Conclusions: When implemented as approaches for assigning Bayesian priors, the PenG and GLin methods increase the accuracy of inferred divergence times, showing considerably more precision than the other methods tested, without significantly greater bias. When implemented as approaches to post-hoc scaling of a tree by linear regression, the PenG and GLin methods exhibit less bias than other methods tested. The new methods are simple to use and can be applied to a variety of studies that call for calibrated trees.

Genetics of intra-species variation in avoidance behavior induced by a thermal stimulus in C. elegans

Genetics of intra-species variation in avoidance behavior induced by a thermal stimulus in C. elegans
RAJARSHI GHOSH, JOSHUA S BLOOM, Aylia Mohammadi, MOLLY E SCHUMER, PETER ANDOLFATTO, WILLIAM S RYU, LEONID KRUGLYAK
doi: http://dx.doi.org/10.1101/014290

Individuals within a species vary in their responses to a wide range of stimuli, partly as a result of differences in their genetic makeup. Relatively little is known about the genetic and neuronal mechanisms contributing to diversity of behavior in natural populations. By studying animal-to-animal variation in innate avoidance behavior to thermal stimuli in the nematode Caenorhabditis elegans, we uncovered genetic principles of how different components of a behavioral response can be altered in nature to generate behavioral diversity. Using a thermal pulse assay, we uncovered heritable variation in responses to a transient temperature increase. Quantitative trait locus mapping revealed that separate components of this response were controlled by distinct genomic loci. The loci we identified contributed to variation in components of thermal pulse avoidance behavior in an additive fashion. Our results show that the escape behavior induced by thermal stimuli is composed of simpler behavioral components that are influenced by at least six distinct genetic loci. The loci that decouple components of the escape behavior reveal a genetic system that allows independent modification of behavioral parameters. Our work sets the foundation for future studies of evolution of innate behaviors at the molecular and neuronal level.

Partitioning heritability by functional category using GWAS summary statistics

Partitioning heritability by functional category using GWAS summary statistics
Hilary Kiyo Finucane, Brendan Bulik-Sullivan, Alexander Gusev, Gosia Trynka, Yakir Reshef, Po-Ru Loh, Verneri Anttilla, Han Xu, Chongzhi Zang, Kyle Farh, Stephan Ripke, Felix Day, ReproGen Consortium, Schizophrenia Working Group of the Psychiatric Genetics Consortium, RACI Consortium, Shaun Purcell, Eli Stahl, Sara Lindstrom, John R.B. Perry, Yukinori Okada, Soumya Raychaudhuri, Mark Daly, Nick Patterson, Benjamin M. Neale, Alkes L. Price
doi: http://dx.doi.org/10.1101/014241

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here, we analyze a broad set of functional elements, including cell-type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits spanning a total of 1.3 million phenotype measurements. To enable this analysis, we introduce a new method for partitioning heritability from GWAS summary statistics while controlling for linked markers. This new method is computationally tractable at very large sample sizes, and leverages genome-wide information. Our results include a large enrichment of heritability in conserved regions across many traits; a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers; and many cell-type-specific enrichments including significant enrichment of central nervous system cell types in body mass index, age at menarche, educational attainment, and smoking behavior. These results demonstrate that GWAS can aid in understanding the biological basis of disease and provide direction for functional follow-up.

Ancestry specific association mapping in admixed populations

Ancestry specific association mapping in admixed populations

Line Skotte, Thorfinn Sand S Korneliussen, Ida Moltke, Anders Albrechtsen
doi: http://dx.doi.org/10.1101/014001

As recently demonstrated in several genetic association studies, historically small and isolated populations can offer increased statistical power due to extended link- age equilibrium and increased genetic drift over many generations. However, many such populations, like the Greenlandic Inuit population, have recently experienced substantial admixture with other populations, which can complicate the association studies. One important complication is that most current methods for performing association testing are based on the assumption that the effect of the tested ge- netic marker is the same regardless of ancestry. This is a reasonable assumption for a causal variant, but may not hold for the genetic markers that are tested in association studies, which are usually not causal. The effects of non-causal genetic markers depend on how strongly their presence correlate with the presence of the causal marker, and this may vary between ancestral populations because of different linkage disequilibrium patterns and allele frequencies. Motivated by this, we here introduce a new statistical method for association testing in recently admixed populations, where the effect sizes are allowed to depend on the ancestry of the allele.Our method does not rely on accurate inference of local ancestry, yet using simulations we show that in some scenarios it gives a dramatic increase in statistical power to detect associations. In addition, the method allows for testing for difference in effect size between ancestral populations, which can be used to determine if a SNP is causal. We demonstrate the usefulness of the method on data from the Greenlandic population.

Alternative splicing QTLs in European and African populations using Altrans, a novel method for splice junction quantification

Alternative splicing QTLs in European and African populations using Altrans, a novel method for splice junction quantification

Halit Ongen, Emmanouil T Dermitzakis
doi: http://dx.doi.org/10.1101/014126

With the advent of RNA-sequencing technology we now have the power to detect different types of alternative splicing and how DNA variation affects splicing. However, given the short read lengths used in most population based RNA-sequencing experiments, quantifying transcripts accurately remains a challenge. Here we present a novel method, Altrans, for discovery of alternative splicing quantitative trait loci (asQTLs). To assess the performance of Altrans we compared it to Cufflinks, a well-established transcript quantification method. Simulations show that in the presence of transcripts absent from the annotation, Altrans performs better in quantifications than Cufflinks. We have applied Altrans and Cufflinks to the Geuvadis dataset, which comprises samples from European and African populations, and discovered (FDR = 1%) 1806 and 243 asQTLs with Altrans, and 1596 and 288 asQTLs with Cufflinks for Europeans and Africans, respectively. Although Cufflinks results replicated better across the two populations, this likely due to the increased sensitivity of Altrans in detecting harder to detect associations. We show that, by discovering a set of asQTLs in a smaller subset of European samples and replicating these in the remaining larger subset of Europeans, both methods achieve similar replication levels (94% and 98% replication in Altrans and Cufflinks, respectively). We find that method specific asQTLs are largely due to different types of alternative splicing events detected by each method. We overlapped the asQTLs with biochemically active regions of the genome and observed significant enrichments for many functional marks and variants in splicing regions, highlighting the biological relevance of the asQTLs identified. All together, we present a novel approach for discovering asQTLs that is a more direct assessment of splicing compared to other methods and is complementary to other transcript quantification methods.

Geometric constraints dominate the antigenic evolution of influenza H3N2 hemagglutinin

Geometric constraints dominate the antigenic evolution of influenza H3N2 hemagglutinin

Austin G Meyer, Claus O Wilke
doi: http://dx.doi.org/10.1101/014183

We have carried out a comprehensive analysis of the determinants of human influenza A H3 hemagglutinin evolution, considering three distinct predictors of evolutionary variation at in- dividual sites: solvent accessibility (as a proxy for protein fold stability and/or conservation), experimental epitope sites (as a proxy for host immune bias), and proximity to the receptor- binding region (as a proxy for protein function). We have found that these three predictors individually explain approximately 15% of the variation in site-wise dN/dS. However, the sol- vent accessibility and proximity predictors seem largely independent of each other, while the epitope sites are not. In combination, solvent accessibility and proximity explain 32% of the variation in dN/dS. Incorporating experimental epitope sites into the model adds only an ad- ditional 2 percentage points. We have also found that the historical H3 epitope sites, which date back to the 1980s and 1990s, show only weak overlap with the latest experimental epi- tope data, and we have defined a novel set of four epitope groups which are experimentally supported and cluster in 3D space. Finally, sites with dN/dS > 1, i.e., the sites most likely driving seasonal immune escape, are not correctly predicted by either historical or experimental epitope sites, but only by proximity to the receptor-binding region. In summary, proximity to the receptor-binding region, rather than host immune bias, seems to be the primary determinant of H3 immune-escape evolution.

Integrating crop growth models with whole genome prediction through approximate Bayesian computation

Integrating crop growth models with whole genome prediction through approximate Bayesian computation

Frank Technow, Carlos D. Messina, L. Radu Totir, Mark Cooper
doi: http://dx.doi.org/10.1101/014100

Genomic selection, enabled by whole genome prediction (WGP) methods, is revolutionizing plant breeding. Existing WGP methods have been shown to deliver accurate predictions in the most common settings, such as prediction of across environment performance for traits with additive gene effects. However, prediction of traits with non-additive gene effects and prediction of genotype by environment interaction (GxE), continues to be challenging. Previous attempts to increase prediction accuracy for these particularly difficult tasks employed prediction methods that are purely statistical in nature. Augmenting the statistical methods with biological knowledge has been largely overlooked thus far. Crop growth models (CGMs) attempt to represent the functional relationships between plant physiology and the environment in the formation of yield and similar output traits of interest. Thus, they can explain the impact of GxE and certain types of non-additive gene effects on the expressed phenotype. Approximate Bayesian computation (ABC), a novel and powerful computational procedure, allows the incorporation of CGMs directly into the estimation of whole genome marker effects in WGP. Here we provide a proof of concept study for this novel approach and demonstrate its use with a simulated data set. We show that this novel approach can be considerably more accurate than the benchmark WGP method GBLUP in predicting performance in environments represented in the estimation set as well as in previously unobserved environments for traits determined by non-additive gene effects. We conclude that this proof of concept demonstrates that using ABC for incorporating biological knowledge in the form of CGMs into WGP is a very promising novel approach to improving prediction accuracy for some of the most challenging scenarios of interest to applied geneticists.

Empirical determinants of adaptive mutations in yeast experimental evolution

Empirical determinants of adaptive mutations in yeast experimental evolution

Celia Payen, Anna B Sunshine, Giang T Ong, Jamie L Pogachar, Wei Zhao, Maitreya J Dunham
doi: http://dx.doi.org/10.1101/014068

High-throughput sequencing technologies have enabled expansion of the scope of genetic screens to identify mutations that underlie quantitative phenotypes, such as fitness improvements that occur during the course of experimental evolution. This new capability has allowed us to describe the relationship between fitness and genotype at a level never possible before, and ask deeper questions, such as how genome structure, available mutation spectrum, and other factors drive evolution. Here we combined functional genomics and experimental evolution to first map on a genome scale the distribution of potential beneficial mutations available as a first step to an evolving population and then compare these to the mutations actually observed in order to define the constraints acting upon evolution. We first constructed a single-step fitness landscape for the yeast genome by using barcoded gene deletion and overexpression collections, competitive growth in continuous culture, and barcode sequencing. By quantifying the relative fitness effects of thousands of single-gene amplifications or deletions simultaneously we revealed the presence of hundreds of accessible evolutionary paths. To determine the actual mutation spectrum used in evolution, we built a catalog of >1000 mutations selected during experimental evolution. By combining both datasets, we were able to ask how and why evolution is constrained. We identified adaptive mutations in laboratory evolved populations, derived mutational signatures in a variety of conditions and ploidy states, and determined that half of the mutations accumulated positively affect cellular fitness. We also uncovered hundreds of potential beneficial mutations never observed in the mutational spectrum derived from the experimental evolution catalog and found that those adaptive mutations become accessible in the absence of the dominant adaptive solution. This comprehensive functional screen explored the set of potential adaptive mutations on one genetic background, and allows us for the first time at this scale to compare the mutational path with the actual, spontaneously derived spectrum of mutations.

Feller’s Contributions to Mathematical Biology

Feller’s Contributions to Mathematical Biology

Ellen Baake, Anton Wakolbinger
(Submitted on 21 Jan 2015)

This is a review of William Feller’s important contributions to mathematical biology. The seminal paper [Feller1951] “Diffusion processes in genetics” was particularly influential on the development of stochastic processes at the interface to evolutionary biology, and interesting ideas in this direction (including a first characterization of what is nowadays known as “Feller’s branching diffusion”) already shaped up in the paper [Feller 1939] (written in German) “The foundations of a probabistic treatment of Volterra’s theory of the struggle for life”. Feller’s article “On fitness and the cost of natural selection” [Feller 1967] contains a critical analysis of the concept of “genetic load”.

Approximate statistical alignment by iterative sampling of substitution matrices

Approximate statistical alignment by iterative sampling of substitution matrices

Joseph L. Herman, Adrienn Szabó, Instván Miklós, Jotun Hein
(Submitted on 19 Jan 2015)

We outline a procedure for jointly sampling substitution matrices and multiple sequence alignments, according to an approximate posterior distribution, using an MCMC-based algorithm. This procedure provides an efficient and simple method by which to generate alternative alignments according to their expected accuracy, and allows appropriate parameters for substitution matrices to be selected in an automated fashion. In the cases considered here, the sampled alignments with the highest likelihood have an accuracy consistently higher than alignments generated using the standard BLOSUM62 matrix.