Guillaume Pare, Shihong Mao, Wei Deng
Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to multiple regression models when no interaction or haplotype effects are present. It has multiple applications, such as ranking of genetic regions according to variance explained and derivation of regional gene scores (GS). We show that most genetic variance lies in a small proportion of the genome, and that GS derived from regional associations can improve trait prediction above optimal polygenic scores. Our results also suggest regional associations underlie known linkage peaks.