The overdue promise of short tandem repeat variation for heritability.
Maximilian Press, Keisha D. Carlson, Christine Queitsch
Short tandem repeat (STR) variation has been proposed as a major explanatory factor in the heritability of complex traits in humans and model organisms. However, we still struggle to incorporate STR variation into genotype-phenotype maps. Here, we review the promise of STRs in contributing to complex trait heritability, and highlight the challenges that STRs pose due to their repetitive nature. We argue that STR variants are more likely than single nucleotide variants to have epistatic interactions, reiterate the need for targeted assays to accurately genotype STRs, and call for more appropriate statistical methods in detecting STR-phenotype associations. Lastly, somatic STR variation within individuals may serve as a read-out of disease susceptibility, and is thus potentially a valuable covariate for future association studies.
Power analysis of artificial selection experiments using efficient whole genome simulation of quantitative traits
Darren Kessner, John Novembre
Evolve and resequence studies combine artificial selection experiments with massively parallel sequencing technology to study the genetic basis for complex traits. In these experiments, individuals are selected for extreme values of a trait, causing alleles at quantitative trait loci (QTLs) to increase or decrease in frequency in the experimental population. We present a new analysis of the power of artificial selection experiments to detect and localize quantitative trait loci. This analysis uses a simulation framework that explicitly models whole genomes of individuals, quantitative traits, and selection based on individual trait values. We find that explicitly modeling QTL provides produces qualitatively different insights than considering independent loci with constant selection coefficients. Specifically, we observe how interference between QTLs under selection impacts the trajectories and lengthens the fixation times of selected alleles. We also show that a substantial portion of the genetic variance of the trait (50100%) can be explained by detected QTLs in as little as 20 generations of selection, depending on the trait architecture and experimental design. Furthermore, we show that power depends crucially on the opportunity for recombination during the experiment. Finally, we show that an increase in power is obtained by leveraging founder haplotype information to obtain allele frequency estimates.
Powerful tests for multi-marker association analysis using ensemble learning
Multi-marker approaches are currently gaining a lot of interest in genome wide association studies and can enhance power to detect new associations under certain conditions. Gene and pathway based association tests are increasingly being viewed as useful complements to the more widely used single marker association analysis which have successfully uncovered numerous disease variants. A major drawback of single-marker based methods is that they do not consider pairwise and higher-order interactions between variants. Here, we describe multi-variate methods for gene and pathway based association analyses using phenotype predictions based on machine learning algorithms. Instead of utilizing only a linear or logistic regression model, we propose the use of ensembles of diverse machine learning algorithms for testing multi-variate associations. As the true mathematical relationship between a phenotype and any group of genetic and clinical variables is unknown in advance and may be complex, such a strategy gives us a general and flexible framework to approximate this relationship across different sets of SNPs. We show how phenotype prediction based on our method can be used for constructing tests for SNP set association analysis. We first apply our method to simulated datasets to demonstrate its power and correctness. Then, we apply our method to previously studied asthma-related genes in 2 independent asthma cohorts to conduct association tests.
LIMIX: genetic analysis of multiple traits
Christoph Lippert, Francesco Paolo Casale, Barbara Rakitsch, Oliver Stegle
Multi-trait mixed models have emerged as a promising approach for joint analyses of multiple traits. In principle, the mixed model framework is remarkably general. However, current methods implement only a very specific range of tasks to optimize the necessary computations. Here, we present a multi-trait modeling framework that is versatile and fast: LIMIX enables to flexibly adapt mixed models for a broad range of applications with different observed and hidden covariates, and variable study designs. To highlight the novel modeling aspects of LIMIX we performed three vastly different genetic studies: joint GWAS of correlated blood lipid phenotypes, joint analysis of the expression levels of the multiple transcript-isoforms of a gene, and pathway-based modeling of molecular traits across environments. In these applications we show that LIMIX increases GWAS power and phenotype prediction accuracy, in particular when integrating stepwise multi-locus regression into multi-trait models, and when analyzing large numbers of traits. An open source implementation of LIMIX is freely available at: https://github.com/PMBio/limix.
Adaptation to a novel predator in Drosophila melanogaster: How well are we able to predict evolutionary responses?
Michael DeNieu, William Pitchers, Ian Dworkin
Evolutionary theory is sufficiently well developed to allow for short-term prediction of evolutionary trajectories. In addition to the presence of heritable variation, prediction requires knowledge of the form of natural selection on relevant traits. While many studies estimate the form of natural selection, few examine the degree to which traits evolve in the predicted direction. In this study we examine the form of natural selection imposed by mantid predation on wing size and shape in the fruitfly, Drosophila melanogaster. We then evolve populations of D. melanogaster under predation pressure, and examine the extent to which wing size and shape have responded in the predicted direction. We demonstrate that wing form partially evolves along the predicted vector from selection, more so than for control lineages. Furthermore, we re-examined phenotypic selection after ~30 generations of experimental evolution. We observed that the magnitude of selection on wing size and shape was diminished in populations evolving with mantid predators, while the direction of the selection vector differed from that of the ancestral population for shape. We discuss these findings in the context of the predictability of evolutionary responses, and the need for fully multivariate approaches.
Effective Genetic Risk Prediction Using Mixed Models
David Golan, Saharon Rosset
(Submitted on 12 May 2014)
To date, efforts to produce high-quality polygenic risk scores from genome-wide studies of common disease have focused on estimating and aggregating the effects of multiple SNPs. Here we propose a novel statistical approach for genetic risk prediction, based on random and mixed effects models. Our approach (termed GeRSI) circumvents the need to estimate the effect sizes of numerous SNPs by treating these effects as random, producing predictions which are consistently superior to current state of the art, as we demonstrate in extensive simulation. When applying GeRSI to seven phenotypes from the WTCCC study, we confirm that the use of random effects is most beneficial for diseases that are known to be highly polygenic: hypertension (HT) and bipolar disorder (BD). For HT, there are no significant associations in the WTCCC data. The best existing model yields an AUC of 54%, while GeRSI improves it to 59%. For BD, using GeRSI improves the AUC from 55% to 62%. For individuals ranked at the top 10% of BD risk predictions, using GeRSI substantially increases the BD relative risk from 1.4 to 2.5.
Genetic dissection of MAPK-mediated complex traits across S. cerevisiae
Sebastian Treusch, Frank W Albert, Joshua S Bloom, Iulia E Kotenko, Leonid Kruglyak
Signaling pathways enable cells to sense and respond to their environment. Many cellular signaling strategies are conserved from fungi to humans, yet their activity and phenotypic consequences can vary extensively among individuals within a species. A systematic assessment of the impact of naturally occurring genetic variation on signaling pathways remains to be conducted. In S. cerevisiae, both response and resistance to stressors that activate signaling pathways differ between diverse isolates. Here, we present a quantitative trait locus (QTL) mapping approach that enables us to identify genetic variants underlying such phenotypic differences across the genetic and phenotypic diversity of S. cerevisiae. Using a Round-robin cross between twelve diverse strains, we determined the genetic architectures of phenotypes critically dependent on MAPK signaling cascades. Genetic variants identified fell within MAPK signaling networks themselves as well as other interconnected signaling pathways, illustrating how genetic variation can shape the phenotypic output of highly conserved signaling cascades.
A statistical test for lineage-specific natural selection on quantitative traits based on multiple-line crosses
Nico Riedel, Bhavin S. Khatri, Michael Lässig, Johannes Berg
Comments: 21 pages, 11 figures
Subjects: Populations and Evolution (q-bio.PE)
Phenotypic differences between species may be attributable to natural selection. However, it is a difficult task to quantify the strength of evidence for selection acting on a particular trait. Here we develop a population-genetic test for selection acting on a quantitative trait, which is based on multiple-line crosses. We show that using multiple lines increases both the power and the scope of selection inference. First, a test based on three or more lines detects selection on a quantitative trait with strongly increased statistical significance, which is quantified by our analysis. Second, a multiple-line test allows to distinguish selection from neutral evolution as well as lineage-specific selection from selection under uniform selection strength. This is in contrast to tests based on two lines, where only differences in selection coefficients can be inferred. Our analytical results are complemented by extensive numerical simulations. We apply the multiple-line test to QTL data on floral character traits in plant species of the Mimulus genus and on photoperiodic traits in different maize strains. In both cases, we find a signature of lineage-specific selection that is not seen in a two-line test. We also extend the multiple-line test to short divergence times.
Genome-wide association of foraging behavior in Drosophila melanogaster fails to support large-effect alleles at the foraging gene
Thomas Turner, Christopher C Giauque, Daniel R Schrider, Andrew D Kern
Thirty four years ago, it was postulated that natural populations of Drosophila melanogaster are comprised of two behavioral morphs termed “rover” and “sitter”, and that this variation is caused mainly by large-effect alleles at a single locus. Since that time, considerable data has been amassed that compares the behavior and physiology of these morphs. Contrary to common assertions, however, published support for the existence of common large effect alleles in nature is quite limited. To further investigate, we quantified the foraging behavior of 36 natural strains, performed a genome-wide association study, and described patterns of molecular evolution at the foraging locus. Though there was significant variation in foraging behavior among genotypes, this variation was continuously distributed and not significantly associated with genetic variation at the foraging gene. Patterns of molecular population genetic variation at this gene also provide no support for the hypothesis that for is a target of long term balancing selection We propose that additional data is required to support a hypothesis of common alleles of large effect on foraging behavior in nature. Genome-wide association does support a role for natural variation at several other loci, including the sulfateless gene, though these associations should be considered preliminary until validated with a larger sample size.
Regulatory variants explain much more heritability than coding variants across 11 common diseases
Alexander Gusev, S Hong Lee, Benjamin M Neale, Gosia Trynka, Bjarni J Vilhjalmsson, Hilary Finucane, Han Xu, Chongzhi Zang, Stephan Ripke, Eli Stahl, n/a Schizophrenia Working Group of the PGC, n/a SWE-SCZ Consortium, Anna K Kahler, Christina M Hultman, Shaun M Purcell, Steven A McCarroll, Mark Daly, Bogdan Pasaniuc, Patrick F Sullivan, Naomi R Wray, Soumya Raychaudhuri, Alkes L Price
Common variants implicated by genome-wide association studies (GWAS) of complex diseases are known to be enriched for coding and regulatory variants. We applied methods to partition the heritability explained by genotyped SNPs (h2g) across functional categories (while accounting for shared variance due to linkage disequilibrium) to genotype and imputed data for 11 common diseases. DNaseI Hypersensitivity Sites (DHS) from 218 cell-types, spanning 16% of the genome, explained an average of 79% of h2g (5.1× enrichment; P < 10−20); further enrichment was observed at enhancer and cell-type specific DHS elements. The enrichments were much smaller in analyses that did not use imputed data or were restricted to GWAS- associated SNPs. In contrast, coding variants, spanning 1% of the genome, explained only 8% of h2g (13.8× enrichment; P = 5 × 10−4). We replicated these findings but found no significant contribution from rare coding variants in an independent schizophrenia cohort genotyped on GWAS and exome chips.