Identification of Slco1a6 as a candidate gene that broadly affects gene expression in mouse pancreatic islets
Jianan Tian, Mark Keller, Angie Oler, Mary Rabagalia, Kathryn Schueler, Donald Stapleton, Aimee Teo Broman, Wen Zhao, Christina Kendziorski, Brian S. Yandell, Bruno Hagenbuch, Karl W Broman, Alan D. Attie
We surveyed gene expression in six tissues in an F2 intercross between mouse strains C57BL/6J (abbreviated B6) and BTBR T+ tf /J (abbreviated BTBR) made genetically obese with the Leptin(ob) mutation. We identified a number of expression quantitative trait loci (eQTL) affecting the expression of numerous genes distal to the locus, called trans-eQTL hotspots. Some of these trans-eQTL hotspots showed effects in multiple tissues, whereas some were specific to a single tissue. An unusually large number of transcripts (7% of genes) mapped in trans to a hotspot on chromosome 6, specifically in pancreatic islets. By considering the first two principal components of the expression of genes mapping to this region, we were able to convert the multivariate phenotype into a simple Mendelian trait. Fine-mapping the locus by traditional methods reduced the QTL interval to a 298 kb region containing only three genes, including Slco1a6, one member of a large family of organic anion transporters. Direct genomic sequencing of all Slco1a6 exons identified a non-synonymous coding SNP that converts a highly conserved proline residue at amino acid position 564 to serine. Molecular modeling suggests that Pro564 faces an aqueous pore within this 12-transmembrane domain-spanning protein. When transiently overexpressed in HEK293 cells, BTBR OATP1A6-mediated cellular uptake of the bile acid taurocholic acid (TCA) was enhanced compared to B6 OATP1A6. Our results suggest that genetic variation in Slco1a6 leads to altered transport of TCA (and potentially other bile acids) by pancreatic islets, resulting in broad gene regulation.
The Multi-allelic Genetic Architecture of a Variance-heterogeneity Locus for Molybdenum Accumulation Acts as a Source of Unexplained Additive Genetic Variance
Simon K G Forsberg, Matthew E Andreatta, Xin-Yuan Huang, John Danku, David E Salt, Örjan Carlborg
Most biological traits are regulated by both genetic and environmental factors. Individual loci contributing to the phenotypic diversity in a population are generally identified by their contributions to the trait mean. Genome-wide association (GWA) analyses can also detect loci based on variance differences between genotypes and several hypotheses have been proposed regarding the possible genetic mechanisms leading to such signals. Little is, however, known about what causes them and whether this genetic variance-heterogeneity reflects mechanisms of importance in natural populations. Previously, we identified a variance-heterogeneity GWA (vGWA) signal for leaf molybdenum concentrations in Arabidopsis thaliana. Here, fine-mapping of this association to a ~78 kb Linkage Disequilibrium (LD)-block reveals that it emerges from the independent effects of three genetic polymorphisms on the high-variance associated version of this LD-block. By revealing the genetic architecture underlying this vGWA signal, we uncovered the molecular source of a significant amount of hidden additive genetic variation (“missing heritability”). Two of the three polymorphisms on the high-variance LD-block are promoter variants for Molybdate transporter 1 (MOT1), and the third a variant located ~25 kb downstream of this gene. A fourth independent association was also detected ~600 kb upstream of the LD-block. Testing of T-DNA knockout alleles for genes in the associated regions suggest AT2G25660 (unknown function) and AT2G26975 (Copper Transporter 6; COPT6) as the strongest candidates for the associations outside MOT1. Our results show that multi-allelic genetic architectures within a single LD-block can lead to a variance-heterogeneity between genotypes in natural populations. Further they provide novel insights into the genetic regulation of ion homeostasis in A. thaliana, and empirically confirm that variance-heterogeneity based GWA methods are a valuable tool to detect novel associations of biological importance in natural populations.
FIQT: a simple, powerful method to accurately estimate effect sizes in genome scans
Tim B Bigdeli, Donghyung Lee, Brien P Riley, Vladimir I Vladimirov, Ayman H Fanous, Kenneth S Kendler, Silviu-Alin Bacanu
Genome scans, including both genome-wide association studies and deep sequencing, continue to discover a growing number of significant association signals for various traits. However, often variants meeting genome-wide significance criteria explain far less of the overall trait variance than “sub-threshold” association signals. To extract these sub-threshold signals, there is a need for methods which accurately estimate the mean of all (normally-distributed) test-statistics from a genome scan (i.e., Z-scores). This is currently achieved by the difficult procedures of adjusting all Z-score (χ_1^2) statistics for “winner’s curse” (multiple testing). Given that multiple testing adjustments are much simpler for p-values, we propose a method for estimating Z-scores means by i) first adjusting their p-values for multiple testing and then ii) transforming the adjusted p-values to upper tail Z-scores with the sign of the original statistics. Because a False Discovery Rate (FDR) procedure is used for multiple testing adjustment, we denote this method FDR Inverse Quantile Transformation (FIQT). When compared to competitors, e.g. Empirical Bayes (including proposed improvements), FIQT is more i) accurate and ii) computationally efficient by orders of magnitude. Its accuracy advantage is substantial at larger sample sizes and/or moderate numbers of association signals. Practical application of FIQT to Z-scores from the first Psychiatric Genetic Consortium (PGC) schizophrenia predicts a non-trivial fraction of the significant signal regions from the subsequent published PGC schizophrenia studies. Finally, we suggest that FIQT might be i) used to improve subject level risk prediction and ii) further improved by modelling the noncentrality of χ_1^2 statistics.
Integration of experiments across diverse environments identifies the genetic determinants of variation in Sorghum bicolor seed element composition
Nadia Shakoor , Greg Ziegler , Brian P Dilkes , Zachary Brenton , Richard Boyles , Erin L Connolly , Stephen Kresovich , Ivan Baxter
Seedling establishment and seed nutritional quality require the sequestration of sufficient mineral nutrients. Identification of genes and alleles that modify element content in the grains of cereals, including Sorghum bicolor, is fundamental to developing breeding and selection methods aimed at increasing bioavailable mineral content and improving crop growth. We have developed a high throughput workflow for the simultaneous measurement of multiple elements in Sorghum seeds. We measured seed element levels in the genotyped Sorghum Association Panel (SAP), representing all major cultivated sorghum races from diverse geographic and climatic regions, and mapped alleles contributing to seed element variation across three environments by genome-wide association. We observed significant phenotypic and genetic correlation between several elements across multiple years and diverse environments. The power of combining high-precision measurements with genome wide association was demonstrated by implementing rank transformation and a multilocus mixed model (MLMM) to map alleles controlling 20 element traits, identifying 255 loci affecting the sorghum seed ionome. Sequence similarity to genes characterized in previous studies identified likely causative genes for the accumulation of zinc (Zn) manganese (Mn), nickel (Ni), calcium (Ca) and cadmium (Cd) in sorghum seed. In addition to strong candidates for these four elements, we provide a list of candidate loci for several other elements. Our approach enabled identification of SNPs in strong LD with causative polymorphisms that can be used directly in plant breeding and improvement.
The “Gini index” in genetics: measuring genetic architecture complexity of quantitative traits
Genetic architecture is a general terminology used and discussed very often in complex traits genetics. It is related to the number of functional loci involved in explaining variation of a complex trait and the distribution of genetic effects across these loci. Understanding the complexity level of the genetic architecture of complex traits is essential for evaluating the potential power of mapping functional loci and prediction of complex traits. However, there has been no quantitative measurement of the genetic architecture complexity, which makes it difficult to link results from genetic data analysis to such terminology. Inspired by the “Gini index” for measuring income distribution in economics, I develop a genetic architecture score (“GA score”) to measure genetic architecture complexity. Simulations indicate that the GA score is an effective measurement of the complexity level of complex traits genetic architecture.
Genetic Basis of Transcriptome Diversity in Drosophila melanogaster
Wen Huang , Mary Anna Carbone , Michael Magwire , Jason Peiffer , Richard Lyman , Eric Stone , Robert Anholt , Trudy Mackay
Understanding how DNA sequence variation is translated into variation for complex phenotypes has remained elusive, but is essential for predicting adaptive evolution, selecting agriculturally important animals and crops, and personalized medicine. Here, we quantified genome-wide variation in gene expression in the sequenced inbred lines of the Drosophila melanogaster Genetic Reference Panel (DGRP). We found that a substantial fraction of the Drosophila transcriptome is genetically variable and organized into modules of genetically correlated transcripts, which provide functional context for newly identified transcribed regions. We identified regulatory variants for the mean and variance of gene expression, the latter of which could often be explained by an epistatic model. Expression quantitative trait loci for the mean, but not the variance, of gene expression were concentrated near genes. This comprehensive characterization of population scale diversity of transcriptomes and its genetic basis in the DGRP is critically important for a systems understanding of quantitative trait variation.
Fulfilling the promise of Mendelian randomization
Many important questions in medicine involve questions about causality, For example, do low levels of high-density lipoproteins (HDL) cause heart disease? Does high body mass index (BMI) cause type 2 diabetes? Or are these traits simply correlated in the population for other reasons? A popular approach to answering these problems using human genetics is called “Mendelian randomization”. We discuss the prospects and limitations of this approach, and some ways forward.