C. elegans harbors pervasive cryptic genetic variation for embryogenesis
Annalise Paaby, Amelia White, David Riccardi, Kristin Gunsalus, Fabio Piano, Matthew Rockman
Conditionally functional mutations are an important class of natural genetic variation, yet little is known about their prevalence in natural populations or their contribution to disease risk. Here, we describe a vast reserve of cryptic genetic variation, alleles that are normally silent but which affect phenotype when the function of other genes is perturbed, in the gene networks of C. elegans embryogenesis. We find evidence that cryptic-effect loci are ubiquitous and segregate at intermediate frequencies in the wild. The cryptic alleles demonstrate low developmental pleiotropy, in that specific, rather than general, perturbations are required to reveal them. Our findings underscore the importance of genetic background in characterizing gene function and provide a model for the expression of conditionally functional effects that may be fundamental in basic mechanisms of trait evolution and the genetic basis of disease susceptibility.
An amino acid polymorphism in the Drosophila insulin receptor demonstrates pleiotropic and adaptive function in life history traits
Annalise B. Paaby, Alan O. Bergland, Emily L. Behrman, Paul S. Schmidt
Finding the specific nucleotides that underlie adaptive variation is a major goal in evolutionary biology, but polygenic traits pose a challenge because the complex genotype-phenotype relationship can obscure the effects of individual alleles. However, natural selection working in large wild populations can shift allele frequencies and indicate functional regions of the genome. Previously, we showed that the two most common alleles of a complex amino acid insertion-deletion polymorphism in the Drosophila insulin receptor show independent, parallel clines in frequency across the North American and Australian continents. Here, we report that the cline is stable over at least a five-year period and that the polymorphism also demonstrates temporal shifts in allele frequency concurrent with seasonal change. We tested the alleles for effects on levels of insulin signaling, fecundity, development time, body size, stress tolerance, and lifespan. We find that the alleles are associated with predictable differences in these traits, consistent with patterns of Drosophila life history variation across geography that likely reflect adaptation to the heterogeneous climatic environment. These results implicate insulin signaling as a major mediator of life history adaptation in Drosophila, and suggest that life history tradeoffs can be explained by extensive pleiotropy at a single locus.
On the genetic architecture of intelligence and other quantitative traits
Stephen D.H. Hsu
(Submitted on 14 Aug 2014)
How do genes affect cognitive ability or other human quantitative traits such as height or disease risk? Progress on this challenging question is likely to be significant in the near future. I begin with a brief review of psychometric measurements of intelligence, introducing the idea of a “general factor” or g score. The main results concern the stability, validity (predictive power), and heritability of adult g. The largest component of genetic variance for both height and intelligence is additive (linear), leading to important simplifications in predictive modeling and statistical estimation. Due mainly to the rapidly decreasing cost of genotyping, it is possible that within the coming decade researchers will identify loci which account for a significant fraction of total g variation. In the case of height analogous efforts are well under way. I describe some unpublished results concerning the genetic architecture of height and cognitive ability, which suggest that roughly 10k moderately rare causal variants of mostly negative effect are responsible for normal population variation. Using results from Compressed Sensing (L1-penalized regression), I estimate the statistical power required to characterize both linear and nonlinear models for quantitative traits. The main unknown parameter s (sparsity) is the number of loci which account for the bulk of the genetic variation. The required sample size is of order 100s, or roughly a million in the case of cognitive ability.
The overdue promise of short tandem repeat variation for heritability.
Maximilian Press, Keisha D. Carlson, Christine Queitsch
Short tandem repeat (STR) variation has been proposed as a major explanatory factor in the heritability of complex traits in humans and model organisms. However, we still struggle to incorporate STR variation into genotype-phenotype maps. Here, we review the promise of STRs in contributing to complex trait heritability, and highlight the challenges that STRs pose due to their repetitive nature. We argue that STR variants are more likely than single nucleotide variants to have epistatic interactions, reiterate the need for targeted assays to accurately genotype STRs, and call for more appropriate statistical methods in detecting STR-phenotype associations. Lastly, somatic STR variation within individuals may serve as a read-out of disease susceptibility, and is thus potentially a valuable covariate for future association studies.
Power analysis of artificial selection experiments using efficient whole genome simulation of quantitative traits
Darren Kessner, John Novembre
Evolve and resequence studies combine artificial selection experiments with massively parallel sequencing technology to study the genetic basis for complex traits. In these experiments, individuals are selected for extreme values of a trait, causing alleles at quantitative trait loci (QTLs) to increase or decrease in frequency in the experimental population. We present a new analysis of the power of artificial selection experiments to detect and localize quantitative trait loci. This analysis uses a simulation framework that explicitly models whole genomes of individuals, quantitative traits, and selection based on individual trait values. We find that explicitly modeling QTL provides produces qualitatively different insights than considering independent loci with constant selection coefficients. Specifically, we observe how interference between QTLs under selection impacts the trajectories and lengthens the fixation times of selected alleles. We also show that a substantial portion of the genetic variance of the trait (50100%) can be explained by detected QTLs in as little as 20 generations of selection, depending on the trait architecture and experimental design. Furthermore, we show that power depends crucially on the opportunity for recombination during the experiment. Finally, we show that an increase in power is obtained by leveraging founder haplotype information to obtain allele frequency estimates.
Powerful tests for multi-marker association analysis using ensemble learning
Multi-marker approaches are currently gaining a lot of interest in genome wide association studies and can enhance power to detect new associations under certain conditions. Gene and pathway based association tests are increasingly being viewed as useful complements to the more widely used single marker association analysis which have successfully uncovered numerous disease variants. A major drawback of single-marker based methods is that they do not consider pairwise and higher-order interactions between variants. Here, we describe multi-variate methods for gene and pathway based association analyses using phenotype predictions based on machine learning algorithms. Instead of utilizing only a linear or logistic regression model, we propose the use of ensembles of diverse machine learning algorithms for testing multi-variate associations. As the true mathematical relationship between a phenotype and any group of genetic and clinical variables is unknown in advance and may be complex, such a strategy gives us a general and flexible framework to approximate this relationship across different sets of SNPs. We show how phenotype prediction based on our method can be used for constructing tests for SNP set association analysis. We first apply our method to simulated datasets to demonstrate its power and correctness. Then, we apply our method to previously studied asthma-related genes in 2 independent asthma cohorts to conduct association tests.
LIMIX: genetic analysis of multiple traits
Christoph Lippert, Francesco Paolo Casale, Barbara Rakitsch, Oliver Stegle
Multi-trait mixed models have emerged as a promising approach for joint analyses of multiple traits. In principle, the mixed model framework is remarkably general. However, current methods implement only a very specific range of tasks to optimize the necessary computations. Here, we present a multi-trait modeling framework that is versatile and fast: LIMIX enables to flexibly adapt mixed models for a broad range of applications with different observed and hidden covariates, and variable study designs. To highlight the novel modeling aspects of LIMIX we performed three vastly different genetic studies: joint GWAS of correlated blood lipid phenotypes, joint analysis of the expression levels of the multiple transcript-isoforms of a gene, and pathway-based modeling of molecular traits across environments. In these applications we show that LIMIX increases GWAS power and phenotype prediction accuracy, in particular when integrating stepwise multi-locus regression into multi-trait models, and when analyzing large numbers of traits. An open source implementation of LIMIX is freely available at: https://github.com/PMBio/limix.