An alternative to the breeder’s and Lande’s equations

An alternative to the breeder’s and Lande’s equations
Bahram Houchmandzadeh (LIPhy)
(Submitted on 2 Sep 2013)

The breeder’s equation is a cornerstone of quantitative genetics and is widely used in evolutionary modeling. The equation which reads R=h^{2}S relates response to selection R (the mean phenotype of the progeny) to the selection differential S (mean phenotype of selected parents) through a simple proportionality relation. The validity of this relation however relies strongly on the normal (Gaussian) distribution of parent’s genotype which is an unobservable quantity and cannot be ascertained. In contrast, we show here that if the fitness (or selection) function is Gaussian, an alternative, exact linear equation in the form of R’=j^{2}S’ can be derived, regardless of the parental genotype distribution. Here R’ and S’ stand for the mean phenotypic lag behind the mean of the fitness function in the offspring and selected populations. To demonstrate this relation, we derive the exact functional relation between the mean phenotype in the selected and the offspring population and deduce all cases that lead to a linear relation between these quantities. These computations, which are confirmed by individual based numerical simulations, generalize naturally to the multivariate Lande’s equation \Delta\mathbf{\bar{z}}=GP^{-1}\mathbf{S} .

Using Volcano Plots and Regularized-Chi Statistics in Genetic Association Studies

Using Volcano Plots and Regularized-Chi Statistics in Genetic Association Studies
Wentian Li, Jan Freudenberg, Young Ju Suh, Yaning Yang
(Submitted on 28 Aug 2013)

Labor intensive experiments are typically required to identify the causal disease variants from a list of disease associated variants in the genome. For designing such experiments, candidate variants are ranked by their strength of genetic association with the disease. However, the two commonly used measures of genetic association, the odds-ratio (OR) and p-value, may rank variants in different order. To integrate these two measures into a single analysis, here we transfer the volcano plot methodology from gene expression analysis to genetic association studies. In its original setting, volcano plots are scatter plots of fold-change and t-test statistic (or -log of the p-value), with the latter being more sensitive to sample size. In genetic association studies, the OR and Pearson’s chi-square statistic (or equivalently its square root, chi; or the standardized log(OR)) can be analogously used in a volcano plot, allowing for their visual inspection. Moreover, the geometric interpretation of these plots leads to an intuitive method for filtering results by a combination of both OR and chi-square statistic, which we term “regularized-chi”. This method selects associated markers by a smooth curve in the volcano plot instead of the right-angled lines which corresponds to independent cutoffs for OR and chi-square statistic. The regularized-chi incorporates relatively more signals from variants with lower minor-allele-frequencies than chi-square test statistic. As rare variants tend to have stronger functional effects, regularized-chi is better suited to the task of prioritization of candidate genes.

Gene and Gene-Set Analysis for Genome-Wide Association Studies

Gene and Gene-Set Analysis for Genome-Wide Association Studies
Inti Pedroso
(Submitted on 19 Aug 2013)

Genome-wide association studies (GWAS) have identified hundreds of loci at very stringent levels of statistical significance across many different human traits. However, it is now clear that very large samples (n~10^4-10^5) are needed to find the majority of genetic variants underlying risk for most human diseases. Therefore, the field has engaged itself in a race to increase study sample sizes with some studies yielding very successful results but also studies which provide little or no new insights. This project started early on in this new wave of studies and I decided to use an alternative approach that uses prior biological knowledge to improve both interpretation and power of GWAS. The project aimed to a) implement and develop new gene-based methods to derive gene-level statistics to use GWAS in well established system biology tools; b) use of these gene-level statistics in networks and gene-set analyses of GWAS data; c) mine GWAS of neuropsychiatric disorders using gene, gene-sets and integrative biology analyses with gene-expression studies; and d) explore the ability of these methods to improve the analysis GWAS on disease sub-phenotypes which usually suffer of very small sample sizes.

The missing heritability revealed in Arabidopsis thaliana

The missing heritability revealed in Arabidopsis thaliana
Xia Shen
(Submitted on 30 Jul 2013)

Although high-throughput genomic data are widely available, a large proportion of the narrow sense heritability of many complex traits have not been successfully uncovered. In this study, focusing on phenotype prediction, I show that by properly selecting a small number of loci, a significant amount of missing heritability can be revealed. The results provide new insights into the missing heritability problem and the underlying genetic architecture of complex traits.

Wavelet-based genetic association analysis of functional phenotypes arising from high-throughput sequencing assays

Wavelet-based genetic association analysis of functional phenotypes arising from high-throughput sequencing assays
Heejung Shim, Matthew Stephens
(Submitted on 27 Jul 2013)

Understanding how genetic variants influence cellular-level processes is an important step towards understanding how they influence important organismal-level traits, or “phenotypes”, including human disease susceptibility. To this end scientists are undertaking large-scale genetic association studies that aim to identify genetic variants associated with molecular and cellular phenotypes, such as gene expression, transcription factor binding, or chromatin accessibility. These studies use high-throughput sequencing assays (e.g. RNA-seq, ChIP-seq, DNase-seq) to obtain high-resolution data on how the traits vary along the genome in each sample. However, typical association analyses fail to exploit these high-resolution measurements, instead aggregating the data at coarser resolutions, such as genes, or windows of fixed length. Here we develop and apply statistical methods that better exploit the high-resolution data. The key idea is to treat the sequence data as measuring an underlying “function” that varies along the genome, and then, building on wavelet-based methods for functional data analysis, test for association between genetic variants and the underlying function. Applying these methods to identify genetic variants associated with chromatin accessibility (dsQTLs) we find that they identify substantially more associations than a simpler window-based analysis, and in total we identify 772 novel dsQTLs not identified by the original analysis.

Genetics of single-cell protein abundance variation in large yeast populations

Genetics of single-cell protein abundance variation in large yeast populations
Frank W. Albert, Sebastian Treusch, Arthur H. Shockley, Joshua S. Bloom, Leonid Kruglyak
(Submitted on 25 Jul 2013)

Many DNA sequence variants influence phenotypes by altering gene expression. Our understanding of these variants is limited by sample sizes of current studies and by measurements of mRNA rather than protein abundance. We developed a powerful method for identifying genetic loci that influence protein expression in very large populations of the yeast Saccharomyes cerevisiae. The method measures single-cell protein abundance through the use of green-fluorescent-protein tags. We applied this method to 160 genes and detected many more loci per gene than previous studies. We also observed closer correspondence between loci that influence protein abundance and loci that influence mRNA abundance of a given gene. Most loci cluster at hotspot locations that influence multiple proteins – in some cases, more than half of those examined. The variants that underlie these hotspots have profound effects on the gene regulatory network and provide insights into genetic variation in cell physiology between yeast strains.

The impact of population demography and selection on the genetic architecture of complex traits

The impact of population demography and selection on the genetic architecture of complex traits
Kirk E. Lohmueller
(Submitted on 21 Jun 2013)

Studies of thousands of individuals have found genetic evidence for dramatic population growth in recent human history. These studies have also documents high numbers of amino acid changing polymorphisms that are likely evolutionarily important and may be of medic relevance. Here I use population genetic models to demonstrate how the recent population growth has directly led to the accumulation of deleterious amino acid changing polymorphism. I show that recent growth increases the proportion of non synonymous SNPs and that the average mutation is more deleterious in an expanding population than in a non-exanded population. However, population growth does not affect the genetic load of the population. Additionally, I investigate the consequences of recent population growth on the architecture of complex traits. If a mutation’s effect on disease status is correlated with its effect on fitness, then rare variants explain a greater portion of the additive genetic variance of the trait in a population that has recently expanded than in a population that did not recently expand. Further, recent growth can increase the expected number of causal variants for a disease. Such heterogeneity will likely reduce the power of commonly used rare variants association tests. Finally, recent population growth also reduces the causal allele frequency in cases at single mutations, which could decrease the power of single-marker association tests. These findings suggest careful consideration of recent population history will be essential for designing optimal association studies for low-frequency and rare variants.

Genetic Complexity in a Drosophila Model of Diabetes-Associated Misfolded Human Proinsulin

Genetic Complexity in a Drosophila Model of Diabetes-Associated Misfolded Human Proinsulin

Soo-Young Park, Michael Z. Ludwig, Natalia A. Tamarina, Bin Z. He, Sarah H. Carl, Desiree A. Dickerson, Levi Barse, Bharath Arun, Calvin Williams, Cecelia M. Miles, Louis H. Philipson, Donald F. Steiner, Graeme I. Bell, Martin Kreitman
(Submitted on 31 May 2013)

Here we use Drosophila melanogaster to create a genetic model of human permanent neonatal diabetes mellitus and present experimental results describing dimensions of this complexity. The approach involves the transgenic expression of a misfolded mutant of human preproinsulin, hINSC96Y, which is a cause of the disease. When expressed in fly imaginal discs, hINSC96Y causes a reduction of adult structures, including the eye, wing and notum. Eye imaginal discs exhibit defects in both the structure and arrangement of ommatidia. In the wing, expression of hINSC96Y leads to ectopic expression of veins and mechano-sensory organs, indicating disruption of wild type signaling processes regulating cell fates. These readily measurable disease phenotypes are sensitive to temperature, gene dose and sex. Mutant (but not wild type) proinsulin expression in the eye imaginal disc induces IRE1-mediated Xbp1 alternative splicing, a signal for endoplasmic reticulum stress response activation, and produces global change in gene expression. Mutant hINS transgene tester strains, when crossed to stocks from the Drosophila Genetic Reference Panel produces F1 adults with a continuous range of disease phenotypes and large broad-sense heritability. Surprisingly, the severity of mutant hINS-induced disease in the eye is not correlated with that in the notum in these crosses, nor with eye reduction phenotypes caused by the expression of two dominant eye mutants acting in two different eye development pathways, Drop (Dr) or Lobe (L) when crossed into the same genetic backgrounds. The tissue specificity of genetic variability for mutant hINS-induced disease thus has its own distinct signature. The genetic dominance of disease-specific phenotypic variability makes this approach amenable to genome-wide association study (GWAS) in a simple F1 screen of natural variation.

Effect of Genetic Variation in a Drosophila Model of Diabetes-Associated Misfolded Human Proinsulin

Effect of Genetic Variation in a Drosophila Model of Diabetes-Associated Misfolded Human Proinsulin
Bin Z. He, Michael Z. Ludwig, Desiree A. Dickerson, Levi Barse, Bharath Arun, Soo Young Park, Natalia A. Tamarina, Scott B. Selleck, Patricia Wittkopp, Graeme I. Bell, Martin Kreitman
(Submitted on 23 May 2013)

The identification and validation of gene-gene interactions is a major challenge in human studies. Here, we explore an approach for studying epistasis in humans using a Drosophila melanogaster model of neonatal diabetes mellitus. Expression of mutant preproinsulin, hINSC96Y, in the eye imaginal disc mimics the human disease activating conserved cell stress response pathways leading to cell death and reduction in eye area. Dominant-acting variants in wild-derived inbred lines from the Drosophila Genetics Reference Panel produce a continuous, highly heritable, distribution of eye degeneration phenotypes. A genome-wide association study (GWAS) in 154 sequenced lines identified 29 candidate SNPs in 16 loci with P 7.62). RNAi knock-downs of sfl enhanced the eye degeneration phenotype in a mutant-hINS-dependent manner. sfl encodes a protein required for sulfation of the glycosaminoglycan, heparan sulfate. Two additional genes in the heparan sulfate (HS) biosynthetic pathway (tout velu, ttv and brother of tout velu, botv) also modified the eye phenotype, suggesting a link between HS-modified proteins and cellular responses to misfolded proteins. Finally, intronic variants marking the QTL were associated with decreased sfl expression, a result consistent with that predicted by RNAi studies. The ability to create a model of human genetic disease in the fly, map a QTL by GWAS to a specific gene (and noncoding variant), validate its contribution to disease with available genetic resources, and experimentally link the variant to a molecular mechanism, demonstrate the many advantages Drosophila holds in determining the genetic underpinnings of human disease.

Narrowing the gap on heritability of common disease by direct estimation in case-control GWAS

Narrowing the gap on heritability of common disease by direct estimation in case-control GWAS
David Golan, Saharon Rosset
(Submitted on 23 May 2013)

One of the major developments in recent years in the search for missing heritability of human phenotypes is the adoption of linear mixed-effects models (LMMs) to estimate heritability due to genetic variants which are not significantly associated with the phenotype. A variant of the LMM approach has been adapted to case-control studies and applied to many major diseases by Lee et al. (2011), successfully accounting for a considerable portion of the missing heritability. For example, for Crohn’s disease their estimated heritability was 22% compared to 50-60% from family studies. In this letter we propose to estimate heritability of disease directly by regression of phenotype similarities on genotype correlations, corrected to account for ascertainment. We refer to this method as genetic correlation regression (GCR). Using GCR we estimate the heritability of Crohn’s disease at 34% using the same data. We demonstrate through extensive simulation that our method yields unbiased heritability estimates, which are consistently higher than LMM estimates. Moreover, we develop a heuristic correction to LMM estimates, which can be applied to published LMM results. Applying our heuristic correction increases the estimated heritability of multiple sclerosis from 30% to 52.6%.