SumVg: Total heritability explained by all variants in genome-wide association studies based on summary

SumVg: Total heritability explained by all variants in genome-wide association studies based on summary statistics with standard error estimates
Hon-Cheong SO , Pak C. SHAM
doi: http://dx.doi.org/10.1101/016857

Genome-wide association studies (GWAS) have become increasingly popular these days and one of the key questions is how much heritability could be explained by all variants in GWAS. We have previously proposed an approach to answer this question, based on recovering the “true” z-statistics from a set of observed z-statistics. Only summary statistics are required. However, methods for standard error (SE) estimation are not available yet, thereby limiting the interpretation of the results. In this study we developed resampling-based approaches to estimate the SE and the methods are implemented in an R package. We found that delete-d-jackknife and parametric bootstrap approaches provide good estimates of the SE. Methods to compute the sum of heritability explained and the corresponding SE are implemented in the R package SumVg, available at https://sites.google.com/site/honcheongso/software/var-totalvg

The advent of genome-wide association studies for bacteria

The advent of genome-wide association studies for bacteria
Peter E Chen , B Jesse Shapiro
doi: http://dx.doi.org/10.1101/016873

Significant advances in sequencing technologies and genome-wide association studies (GWAS) have revealed substantial insight into the genetic architecture of human phenotypes. In recent years, the application of this approach in bacteria has begun to reveal the genetic basis of bacterial host preference, antibiotic resistance, and virulence. Here, we consider relevant differences between bacterial and human genome dynamics, apply GWAS to a global sample of Mycobacterium tuberculosis genomes to highlight the impacts of linkage disequilibrium, population stratification, and natural selection, and finally compare the traditional GWAS against phyC, a contrasting method of mapping genotype to phenotype based upon evolutionary convergence. We discuss strengths and weaknesses of both methods, and make suggestions for factors to be considered in future bacterial GWAS.

Dimensionality and the statistical power of multivariate genome-wide association studies

Dimensionality and the statistical power of multivariate genome-wide association studies

Eladio J. Marquez , David Houle
doi: http://dx.doi.org/10.1101/016592

Mutations virtually always have pleiotropic effects, yet most genome-wide association studies (GWAS) analyze effects one trait at a time. In order to investigate the performance of a multivariate approach to GWAS, we simulated scenarios where variation in a d-dimensional phenotype space was caused by a known subset of SNPs. Multivariate analyses of variance were then carried out on k traits, where k could be less than, greater than or equal to d. Our results show that power is maximized and false discovery rate (FDR) minimized when the number of traits analyzed, k, matches the true dimensionality of the phenotype being analyzed, d. When true dimensionality is high, the power of a single univariate analysis can be an order of magnitude less than the k=d case, even when the single trait with the largest genetic variance is chosen for analysis. When traits are added to a study in order of their independent genetic variation, the gains in power from increasing k up to d are much larger than the loss in power when k exceeds d. Simulations that explicitly model linkage disequilibrium (LD) indicate that when SNPs in disequilibrium are subjected to multivariate analysis, the magnitude of the apparent effect induced onto null SNPs by SNPs carrying a true effect weakens as k approaches d, such that the rank of P-values among a set of correlated SNPs becomes an increasingly reliable predictor of true positives. Multivariate GWAS outperform univariate ones under a wide range of conditions, and should become the standard in studies of the inheritance of complex phenotypes.

Two variance component model improves genetic prediction in family data sets

Two variance component model improves genetic prediction in family data sets

George Tucker , Po-Ru Loh , Iona M MacLeod , Ben J Hayes , Michael E Goddard , Bonnie Berger , Alkes L Price
doi: http://dx.doi.org/10.1101/016618

Genetic prediction based on either identity by state (IBS) sharing or pedigree information has been investigated extensively using Best Linear Unbiased Prediction (BLUP) methods. However, methods to combine IBS sharing and pedigree information for genetic prediction in humans have not been explored. We introduce a two variance component model for genetic prediction: one component for IBS sharing and one for approximate pedigree structure, both estimated using genetic markers. In simulations using real genotypes from CARe and FHS family cohorts, we demonstrate that the two variance component model achieves gains in prediction r2 over standard BLUP at current sample sizes, and we project based on simulations that these gains will continue to hold at larger sample sizes. Accordingly, in analyses of four quantitative phenotypes from CARe and two quantitative phenotypes from FHS, the two variance component model significantly improves prediction r2 in each case, with up to a 16% relative improvement. We also find that standard mixed model association tests can produce inflated test statistics in datasets with related individuals, whereas the two variance component model corrects for inflation.

Contrasting regional architectures of schizophrenia and other complex diseases using fast variance components analysis

Contrasting regional architectures of schizophrenia and other complex diseases using fast variance components analysis
Po-Ru Loh , Gaurav Bhatia , Alexander Gusev , Hilary K Finucane , Brendan K Bulik-Sullivan , Samuela J Pollack , Schizophrenia Working Group Psychiatric Genomics Consortium , Teresa R de Candia , Sang Hong Lee , Naomi R Wray , Kenneth S Kendler , Michael C O’Donovan , Benjamin M Neale , Nick Patterson , Alkes L Price
doi: http://dx.doi.org/10.1101/016527

Heritability analyses of GWAS cohorts have yielded important insights into complex disease architecture, and increasing sample sizes hold the promise of further discoveries. Here, we analyze the genetic architecture of schizophrenia in 49,806 samples from the PGC, and nine complex diseases in 54,734 samples from the GERA cohort. For schizophrenia, we infer an overwhelmingly polygenic disease architecture in which ≥76% of 1Mb genomic regions harbor at least one variant influencing schizophrenia risk. We also observe significant enrichment of heritability in GC-rich regions and in higher-frequency SNPs for both schizophrenia and GERA diseases. In bivariate analyses, we observe significant genetic correlations (ranging from 0.18 to 0.85) for 13 of 36 pairs of GERA diseases; genetic correlations were consistently stronger (1.3x on average) than correlations of overall disease liabilities. To accomplish these analyses, we developed a novel, fast algorithm for multi-component, multi-trait variance components analysis that overcomes prior computational barriers that made such analyses intractable at this scale.

Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores

Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores

Bjarni Vilhjalmsson , Jian Yang , Hilary Kiyo Finucane , Alexander Gusev , Sara Lindstrom , Stephan Ripke , Giulio Genovese , Po-Ru Loh , Gaurav Bhatia , Ron Do , Tristian Hayeck , Hong-Hee Won , Schizophrenia Working Group of the Psychiatric Genomics Consortium , the Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study , Sekar Kathiresan , Michele Pato , Carlos Pato , Rulla Tamimi , Eli Stahl , Noah Zaitlen , Bogdan Pasaniuc , Mikkel Schierup , Phillip De Jager , Nikolaos Patsopoulos , Steven A McCarroll , Mark Daly , Shaun Purcell , Daniel Chasman , Benjamin Neale , Mike Goddard , Peter M Visscher , Peter Kraft , Nick J Patterson , Alkes L Price
doi: http://dx.doi.org/10.1101/015859

Polygenic risk scores have shown great promise in predicting complex disease risk, and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves LD-pruning markers and applying a P-value threshold to association statistics, but this discards information and may reduce predictive accuracy. We introduce a new method, LDpred, which infers the posterior mean causal effect size of each marker using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the pruning/thresholding approach, particularly at large sample sizes. Accordingly, prediction R2 increased from 20.1% to 25.3% in a large schizophrenia data set and from 9.8% to 12.0% in a large multiple sclerosis data set. A similar relative improvement in accuracy was observed for three additional large disease data sets and when predicting in non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.

A Pleiotropy-Informed Bayesian False Discovery Rate adapted to a Shared Control Design Finds New Disease Associations From GWAS Summary Statistics

A Pleiotropy-Informed Bayesian False Discovery Rate adapted to a Shared Control Design Finds New Disease Associations From GWAS Summary Statistics

James Liley, Chris Wallace
doi: http://dx.doi.org/10.1101/014886

Genome-wide association studies (GWAS) have been successful in identifying single nucleotide polymorphisms (SNPs) associated with many traits and diseases. However, at existing sample sizes, these variants explain only part of the estimated heritability. Leverage of GWAS results from related phenotypes may improve detection without the need for larger datasets. The Bayesian conditional false discovery rate (cFDR) constitutes an upper bound on the expected false discovery rate (FDR) across a set of SNPs whose p values for two diseases are both less than two disease-specific thresholds. Calculation of the cFDR requires only summary statistics and has several advantages over traditional GWAS analysis. However, existing methods require distinct control samples between studies. Here, we extend the technique to allow for some or all controls to be shared, increasing applicability. Several different SNP sets can be defined with the same cFDR value, and we show that the expected FDR across the union of these sets may exceed expected FDR in any single set. We describe a procedure to establish an upper bound for the expected FDR among the union of such sets of SNPs. We apply our technique to pairwise analysis of p values from ten autoimmune diseases with variable sharing of controls, enabling discovery of 59 SNP-disease associations which do not reach GWAS significance after genomic control in individual datasets. Most of the SNPs we highlight have previously been confirmed using replication studies or larger GWAS, a useful validation of our technique; we report eight SNP-disease associations across five diseases not previously declared. Our technique extends and strengthens the previous algorithm, and establishes robust limits on the expected FDR. This approach can improve SNP detection in GWAS, and give insight into shared aetiology between phenotypically related conditions.

Mediated pleiotropy between psychiatric disorders and autoimmune disorders revealed by integrative analysis of multiple GWAS

Mediated pleiotropy between psychiatric disorders and autoimmune disorders revealed by integrative analysis of multiple GWAS

Qian Wang, Can Yang, Joel Gelernter, Hongyu Zhao
doi: http://dx.doi.org/10.1101/014530

Epidemiological observations and molecular-level experiments have indicated that brain disorders in the realm of psychiatry may be influenced by immune dysregulation. However, the degree of genetic overlap between immune disorders and psychiatric disorders has not been well established. We investigated this issue by integrative analysis of genome-wide association studies (GWAS) of 18 complex human traits/diseases (five psychiatric disorders, seven autoimmune disorders, and others) and multiple genome-wide annotation resources (Central nervous system genes, immune-related expression-quantitative trait loci (eQTL) and DNase I hypertensive sites from 98 cell-lines). We detected pleiotropy in 24 of the 35 psychiatric-autoimmune disorder pairs, with statistical significance as strong as p=3.9e-285 (schizophrenia-rheumatoid arthritis). Strong enrichment (>1.4 fold) of immune-related eQTL was observed in four psychiatric disorders. Genomic regions responsible for pleiotropy between psychiatric disorders and autoimmune disorders were detected. The MHC region on chromosome 6 appears to be the most important (and it was indeed previously noted (1-3) as a confluence between schizophrenia and immune disorder risk regions), with many other regions, such as cytoband 1p13.2. We also found that most alleles shared between schizophrenia and Crohn’s disease have the same effect direction, with similar trend found for other disorder pairs, such as bipolar-Crohn’s disease. Our results offer a novel bird’s-eye view of the genetic relationship and demonstrate strong evidence for mediated pleiotropy between psychiatric disorders and autoimmune disorders. Our findings might open new routes for prevention and treatment strategies for these disorders based on a new appreciation of the importance of immunological mechanisms in mediating risk.

An Atlas of Genetic Correlations across Human Diseases and Traits

An Atlas of Genetic Correlations across Human Diseases and Traits

Brendan Bulik-Sullivan, Hilary K Finucane, Verneri Anttila, Alexander Gusev, Felix R Day, ReproGen Consortium, Psychiatric Genomics Consortium, Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Consortium 3, John R.B. Perry, Nick Patterson, Elise Robinson, Mark J Daly, Alkes L Price, Benjamin M Neale
doi: http://dx.doi.org/10.1101/014498

Identifying genetic correlations between complex traits and diseases can provide useful etiological insights and help prioritize likely causal relationships. The major challenges preventing estimation of genetic correlation from genome-wide association study (GWAS) data with current methods are the lack of availability of individual genotype data and widespread sample overlap among meta-analyses. We circumvent these difficulties by introducing a technique for estimating genetic correlation that requires only GWAS summary statistics and is not biased by sample overlap. We use our method to estimate 300 genetic correlations among 25 traits, totaling more than 1.5 million unique phenotype measurements. Our results include genetic correlations between anorexia nervosa and schizophrenia/ body mass index and associations between educational attainment and several diseases. These results highlight the power of a polygenic modeling framework, since there currently are no genome-wide significant SNPs for anorexia nervosa and only three for educational attainment.

A Single Gene Causes an Interspecific Difference in Pigmentation in Drosophila

A Single Gene Causes an Interspecific Difference in Pigmentation in Drosophila

Yasir H. Ahmed-Braimah, Andrea L. Sweigart
doi: http://dx.doi.org/10.1101/014464

The genetic basis of species differences remains understudied. Studies in insects have contributed significantly to our understanding of morphological evolution. Pigmentation traits in particular have received a great deal of attention and several genes in the insect pigmentation pathway have been implicated in inter- and intraspecific differences. Nonetheless, much remains unknown about many of the genes in this pathway and their potential role in understudied taxa. Here we genetically analyze the puparium color difference between members of the Virilis group of Drosophila. The puparium of Drosophila virilis is black, while those of D. americana, D. novamexicana, and D. lummei are brown. We used a series of backcross hybrid populations between D. americana and D. virilis to map the genomic interval responsible for the difference between this species pair. First, we show that the pupal case color difference is caused by a single Mendelizing factor, which we ultimately map to an ~11kb region on chromosome 5. The mapped interval includes only the first exon and regulatory region(s) of the dopamine N-acetyltransferase gene (Dat). This gene encodes an enzyme that is known to play a part in the insect pigmentation pathway. Second, we show that this gene is highly expressed at the onset of pupation in light-brown taxa (D. americana and D. novamexicana) relative to D. virilis, but not in the dark-brown D. lummei. Finally, we examine the role of Dat in adult pigmentation between D. americana (heavily melanized) and D. novamexicana (lightly melanized) and find no discernible effect of this gene in adults. Our results demonstrate that a single gene is entirely or almost entirely responsible for a morphological difference between species.