Estimating Reproducibility in Genome-Wide Association Studies
Wei Jiang, Jing-Hao Xue, Weichuan Yu
Genome-wide association studies (GWAS) are widely used to discover genetic variants associated with diseases. To control false positives, all findings from GWAS need to be verified with additional evidences, even for associations discovered from a high power study. Replication study is a common verification method by using independent samples. An association is regarded as true positive with a high confidence when it can be identified in both primary study and replication study. Currently, there is no systematic study on the behavior of positives in the replication study when the positive results of primary study are considered as the prior information.
In this paper, two probabilistic measures named Reproducibility Rate (RR) and False Irreproducibility Rate (FIR) are proposed to quantitatively describe the behavior of primary positive associations (i.e. positive associations identified in the primary study) in the replication study. RR is a conditional probability measuring how likely a primary positive association will also be positive in the replication study. This can be used to guide the design of replication study, and to check the consistency between the results of primary study and those of replication study. FIR, on the contrary, measures how likely a primary positive association may still be a true positive even when it is negative in the replication study. This can be used to generate a list of potentially true associations in the irreproducible findings for further scrutiny. The estimation methods of these two measures are given. Simulation results and real experiments show that our estimation methods have high accuracy and good prediction performance.