Identification of Slco1a6 as a candidate gene that broadly affects gene expression in mouse pancreatic islets
Jianan Tian, Mark Keller, Angie Oler, Mary Rabagalia, Kathryn Schueler, Donald Stapleton, Aimee Teo Broman, Wen Zhao, Christina Kendziorski, Brian S. Yandell, Bruno Hagenbuch, Karl W Broman, Alan D. Attie
We surveyed gene expression in six tissues in an F2 intercross between mouse strains C57BL/6J (abbreviated B6) and BTBR T+ tf /J (abbreviated BTBR) made genetically obese with the Leptin(ob) mutation. We identified a number of expression quantitative trait loci (eQTL) affecting the expression of numerous genes distal to the locus, called trans-eQTL hotspots. Some of these trans-eQTL hotspots showed effects in multiple tissues, whereas some were specific to a single tissue. An unusually large number of transcripts (7% of genes) mapped in trans to a hotspot on chromosome 6, specifically in pancreatic islets. By considering the first two principal components of the expression of genes mapping to this region, we were able to convert the multivariate phenotype into a simple Mendelian trait. Fine-mapping the locus by traditional methods reduced the QTL interval to a 298 kb region containing only three genes, including Slco1a6, one member of a large family of organic anion transporters. Direct genomic sequencing of all Slco1a6 exons identified a non-synonymous coding SNP that converts a highly conserved proline residue at amino acid position 564 to serine. Molecular modeling suggests that Pro564 faces an aqueous pore within this 12-transmembrane domain-spanning protein. When transiently overexpressed in HEK293 cells, BTBR OATP1A6-mediated cellular uptake of the bile acid taurocholic acid (TCA) was enhanced compared to B6 OATP1A6. Our results suggest that genetic variation in Slco1a6 leads to altered transport of TCA (and potentially other bile acids) by pancreatic islets, resulting in broad gene regulation.
The Multi-allelic Genetic Architecture of a Variance-heterogeneity Locus for Molybdenum Accumulation Acts as a Source of Unexplained Additive Genetic Variance
Simon K G Forsberg, Matthew E Andreatta, Xin-Yuan Huang, John Danku, David E Salt, Örjan Carlborg
Most biological traits are regulated by both genetic and environmental factors. Individual loci contributing to the phenotypic diversity in a population are generally identified by their contributions to the trait mean. Genome-wide association (GWA) analyses can also detect loci based on variance differences between genotypes and several hypotheses have been proposed regarding the possible genetic mechanisms leading to such signals. Little is, however, known about what causes them and whether this genetic variance-heterogeneity reflects mechanisms of importance in natural populations. Previously, we identified a variance-heterogeneity GWA (vGWA) signal for leaf molybdenum concentrations in Arabidopsis thaliana. Here, fine-mapping of this association to a ~78 kb Linkage Disequilibrium (LD)-block reveals that it emerges from the independent effects of three genetic polymorphisms on the high-variance associated version of this LD-block. By revealing the genetic architecture underlying this vGWA signal, we uncovered the molecular source of a significant amount of hidden additive genetic variation (“missing heritability”). Two of the three polymorphisms on the high-variance LD-block are promoter variants for Molybdate transporter 1 (MOT1), and the third a variant located ~25 kb downstream of this gene. A fourth independent association was also detected ~600 kb upstream of the LD-block. Testing of T-DNA knockout alleles for genes in the associated regions suggest AT2G25660 (unknown function) and AT2G26975 (Copper Transporter 6; COPT6) as the strongest candidates for the associations outside MOT1. Our results show that multi-allelic genetic architectures within a single LD-block can lead to a variance-heterogeneity between genotypes in natural populations. Further they provide novel insights into the genetic regulation of ion homeostasis in A. thaliana, and empirically confirm that variance-heterogeneity based GWA methods are a valuable tool to detect novel associations of biological importance in natural populations.
FIQT: a simple, powerful method to accurately estimate effect sizes in genome scans
Tim B Bigdeli, Donghyung Lee, Brien P Riley, Vladimir I Vladimirov, Ayman H Fanous, Kenneth S Kendler, Silviu-Alin Bacanu
Genome scans, including both genome-wide association studies and deep sequencing, continue to discover a growing number of significant association signals for various traits. However, often variants meeting genome-wide significance criteria explain far less of the overall trait variance than “sub-threshold” association signals. To extract these sub-threshold signals, there is a need for methods which accurately estimate the mean of all (normally-distributed) test-statistics from a genome scan (i.e., Z-scores). This is currently achieved by the difficult procedures of adjusting all Z-score (χ_1^2) statistics for “winner’s curse” (multiple testing). Given that multiple testing adjustments are much simpler for p-values, we propose a method for estimating Z-scores means by i) first adjusting their p-values for multiple testing and then ii) transforming the adjusted p-values to upper tail Z-scores with the sign of the original statistics. Because a False Discovery Rate (FDR) procedure is used for multiple testing adjustment, we denote this method FDR Inverse Quantile Transformation (FIQT). When compared to competitors, e.g. Empirical Bayes (including proposed improvements), FIQT is more i) accurate and ii) computationally efficient by orders of magnitude. Its accuracy advantage is substantial at larger sample sizes and/or moderate numbers of association signals. Practical application of FIQT to Z-scores from the first Psychiatric Genetic Consortium (PGC) schizophrenia predicts a non-trivial fraction of the significant signal regions from the subsequent published PGC schizophrenia studies. Finally, we suggest that FIQT might be i) used to improve subject level risk prediction and ii) further improved by modelling the noncentrality of χ_1^2 statistics.
Integration of experiments across diverse environments identifies the genetic determinants of variation in Sorghum bicolor seed element composition
Nadia Shakoor , Greg Ziegler , Brian P Dilkes , Zachary Brenton , Richard Boyles , Erin L Connolly , Stephen Kresovich , Ivan Baxter
Seedling establishment and seed nutritional quality require the sequestration of sufficient mineral nutrients. Identification of genes and alleles that modify element content in the grains of cereals, including Sorghum bicolor, is fundamental to developing breeding and selection methods aimed at increasing bioavailable mineral content and improving crop growth. We have developed a high throughput workflow for the simultaneous measurement of multiple elements in Sorghum seeds. We measured seed element levels in the genotyped Sorghum Association Panel (SAP), representing all major cultivated sorghum races from diverse geographic and climatic regions, and mapped alleles contributing to seed element variation across three environments by genome-wide association. We observed significant phenotypic and genetic correlation between several elements across multiple years and diverse environments. The power of combining high-precision measurements with genome wide association was demonstrated by implementing rank transformation and a multilocus mixed model (MLMM) to map alleles controlling 20 element traits, identifying 255 loci affecting the sorghum seed ionome. Sequence similarity to genes characterized in previous studies identified likely causative genes for the accumulation of zinc (Zn) manganese (Mn), nickel (Ni), calcium (Ca) and cadmium (Cd) in sorghum seed. In addition to strong candidates for these four elements, we provide a list of candidate loci for several other elements. Our approach enabled identification of SNPs in strong LD with causative polymorphisms that can be used directly in plant breeding and improvement.
The “Gini index” in genetics: measuring genetic architecture complexity of quantitative traits
Genetic architecture is a general terminology used and discussed very often in complex traits genetics. It is related to the number of functional loci involved in explaining variation of a complex trait and the distribution of genetic effects across these loci. Understanding the complexity level of the genetic architecture of complex traits is essential for evaluating the potential power of mapping functional loci and prediction of complex traits. However, there has been no quantitative measurement of the genetic architecture complexity, which makes it difficult to link results from genetic data analysis to such terminology. Inspired by the “Gini index” for measuring income distribution in economics, I develop a genetic architecture score (“GA score”) to measure genetic architecture complexity. Simulations indicate that the GA score is an effective measurement of the complexity level of complex traits genetic architecture.
Genetic Basis of Transcriptome Diversity in Drosophila melanogaster
Wen Huang , Mary Anna Carbone , Michael Magwire , Jason Peiffer , Richard Lyman , Eric Stone , Robert Anholt , Trudy Mackay
Understanding how DNA sequence variation is translated into variation for complex phenotypes has remained elusive, but is essential for predicting adaptive evolution, selecting agriculturally important animals and crops, and personalized medicine. Here, we quantified genome-wide variation in gene expression in the sequenced inbred lines of the Drosophila melanogaster Genetic Reference Panel (DGRP). We found that a substantial fraction of the Drosophila transcriptome is genetically variable and organized into modules of genetically correlated transcripts, which provide functional context for newly identified transcribed regions. We identified regulatory variants for the mean and variance of gene expression, the latter of which could often be explained by an epistatic model. Expression quantitative trait loci for the mean, but not the variance, of gene expression were concentrated near genes. This comprehensive characterization of population scale diversity of transcriptomes and its genetic basis in the DGRP is critically important for a systems understanding of quantitative trait variation.
Fulfilling the promise of Mendelian randomization
Many important questions in medicine involve questions about causality, For example, do low levels of high-density lipoproteins (HDL) cause heart disease? Does high body mass index (BMI) cause type 2 diabetes? Or are these traits simply correlated in the population for other reasons? A popular approach to answering these problems using human genetics is called “Mendelian randomization”. We discuss the prospects and limitations of this approach, and some ways forward.
Genomic prediction of celiac disease targeting HLA-positive individuals
Gad Abraham , Alexia Rohmer , Jason A Tye-Din , Michael Inouye
Background: Genomic prediction aims to leverage genome-wide genetic data towards better disease diagnostics and risk scores. We have previously published a genomic risk score (GRS) for celiac disease (CD), a common and highly heritable autoimmune disease, which differentiates between CD cases and population-based controls at a clinically-relevant predictive level, improving upon other gene-based approaches. HLA risk haplotypes, particularly HLA-DQ2.5, are necessary but not sufficient for CD, with at least one HLA risk haplotype present in up to half of most Caucasian populations. Here, we assess a genomic prediction strategy that specifically targets this common genetic susceptibility subtype, utilizing a supervised learning procedure for CD that leverages known HLA-DQ2.5 risk. Methods: Using L1/L2-regularized support-vector machines trained on large European case-control datasets, we constructed novel CD GRSs specific to individuals with HLA-DQ2.5 risk haplotypes (GRS-DQ2.5) and compared them with the predictive power of the existing CD GRS (GRS14) as well as two haplotype-based approaches, externally validating the results in a North American case-control study. Results: Consistent with previous observations, both the existing GRS14 and the GRS-DQ2.5 had better predictive performance than the HLA haplotype approaches. GRS-DQ2.5 models, based on directly genotyped or imputed markers, achieved similar levels of predictive performance (AUC = 0.718—0.73), which were substantially higher than those obtained from the DQ2.5 zygosity alone (AUC = 0.558), the HLA risk haplotype method (AUC = 0.634), or the generic GRS14 (AUC = 0.679). In a screening model of at-risk individuals, the GRS-DQ2.5 lowered the number of unnecessary follow-up tests for CD across most sensitivity levels. Relative to a baseline implicating all DQ2.5-positive individuals for follow-up, the GRS-DQ2.5 resulted in a net saving of 2.2 unnecessary follow-up tests for each justified test while still capturing 90% of DQ2.5-positive CD cases. Conclusions: Genomic risk scores for CD that target genetically at-risk sub-groups improve predictive performance beyond traditional approaches and may represent a useful strategy for prioritizing individuals at increase risk of disease, thus potentially reducing unnecessary follow-up diagnostic tests.
Threshold trait architecture of Hsp90-buffered variation
Charles C Carey , Kristen F Gorman , Becky Howsmon , Charles Kooperberg , Aaron K Aragaki , Suzannah Rutherford
Common genetic variants buffered by Hsp90 are candidates for human diseases of signaling such as cancer. Like cancer, morphological abnormalities buffered by Hsp90 are discrete threshold traits with a continuous underlying basis of liability determining their probability of occurrence. QTL and deletion maps for one of the most frequent Hsp90-dependent abnormalities in Drosophila, deformed eye (dfe), were replicated across three genetically related artificial selection lines using strategies dependent on proximity to the dfe threshold and the direction of genetic and environmental effects. Up to 17 dfe loci (QTL) linked by 7 interactions were detected based on the ability of small recombinant regions of an unaffected and completely homozygous control genotype to dominantly suppress or enhance dfe penetrance at its threshold in groups of isogenic recombinant flies, and over 20 deletions increased dfe penetrance from a low expected value in one or more line, identifying a complex network of genes responsible for the dfe phenotype. Replicated comparisons of these whole-genome mapping approaches identified several QTL regions narrowly defined by deletions and 4 candidate genes, with additional uncorrelated QTL and deletions highlighting differences between the approaches and the need for caution in attributing the effect of deletions directly to QTL genes.
The origins of a novel butterfly wing patterning gene from within a family of conserved cell cycle regulators
Nicola Nadeau , Carolina Pardo-Diaz , Annabel Whibley , Megan Ann Supple , Richard Wallbank , Grace C. Wu , Luana Maroja , Laura Ferguson , Heather Hines , Camilo Salazar , Richard ffrench-Constant , Mathieu Joron , William Owen McMillan , Chris Jiggins
A major challenge in evolutionary biology is to understand the origins of novel structures. The wing patterns of butterflies and moths are derived phenotypes unique to the Lepidoptera. Here we identify a gene that we name poikilomousa (poik), which regulates colour pattern switches in the mimetic Heliconius butterflies. Strong associations between phenotypic variation and DNA sequence variation are seen in three different Heliconius species, in addition to associations between gene expression and colour pattern. Colour pattern variants are also associated with differences in splicing of poik transcripts. poik is a member of the conserved fizzy family of cell cycle regulators. It belongs to a faster evolving subfamily, the closest functionally characterised orthologue being the cortex gene in Drosophila, a female germ-line specific protein involved in meiosis. poik appears to have adopted a novel function in the Lepidoptera and become a major target for natural selection acting on colour and pattern variation in this group.