Winners curse correction and variable thresholding improve performance of polygenic risk modeling based on summary-level data from genome-wide association studies
Heritability analysis suggests that genome-wide association studies (GWAS) have the potential to improve genetic risk prediction for complex diseases. Polygenic risk-score (PRS) is a widely used modelling technique that requires only availability of summary-level data from the discovery samples. We propose two modifications to improve the performance of PRS. First, we propose threshold dependent winners curse adjustments for marginal association coefficients that are used to weight the SNPs in PRS. Second, to exploit various external functional/annotation knowledge that might identify subset of SNPs highly enriched for association signals, we consider using variable thresholds for SNPs selection. We applied our methods to the GWAS summary-level data of fourteen complex diseases. Our analysis shows that while a simple winners curse correction uniformly leads to enhancement of performance of the models across traits, incorporation of functional SNPs was beneficial for only selected traits. Compared to standard PRS algorithm, the proposed methods in combination leads to substantial efficiency gain (25-50% increase in the prediction R2) for five out of fifteen diseases. As an example, for GWAS of type 2 diabetes, the lasso-based winners curse correction improves prediction R2 from 2.29% based on standard PRS to 3.1% (P=0.0017) and incorporating functional annotation data further improved R2 to 3.53% (P=2.0E-5). Our simulation studies provided further clarification why differential treatment of certain category of functional SNPs, even when shown to be highly enriched for GWAS-heritability, does not lead to proportionate improvement in genetic risk-prediction due to non-uniform linkage disequilibrium structure.