The weighting is the hardest part: on the behavior of the likelihood ratio test and score test under weight misspecification in rare variant association studies
Camelia Claudia Minica, Giulio Genovese, Dorret I. Boomsma, Christina M. Hultman, René Pool, Jacqueline M. Vink, Conor V. Dolan, Benjamin M. Neale
Rare variant association studies are gaining importance in human genetic research with the increasing availability of exome/genome sequence data. One important test of association between a target set of rare variants (RVs) and a given phenotype is the sequence kernel association test (SKAT). Assignment of weights reflecting the hypothesized contribution of the RVs to the trait variance is embedded within any set-based test. Since the true weights are generally unknown, it is important to establish the effect of weight misspecification in SKAT. We used simulated and real data to characterize the behavior of the likelihood ratio test (LRT) and score test under weight misspecification. Results revealed that LRT is generally more robust to weight misspecification, and more powerful than score test in such a circumstance. For instance, when the rare variants within the target were simulated to have larger betas than the more common ones, incorrect assignment of equal weights reduced the power of the LRT by ~5% while the power of score test dropped by ~30%. Furthermore, LRT was more robust to the inclusion of weighed neutral variation in the test. To optimize weighting we proposed the use of a data-driven weighting scheme. With this approach and the LRT we detected significant enrichment of case mutations with MAF below 5% (P-value=7E-04) of a set of highly constrained genes in the Swedish schizophrenia case-control cohort of 4,940 individuals with observed exome-sequencing data. The score test is currently widely used in sequence kernel association studies for both its computational efficiency and power. Indeed, assuming correct specification, in some circumstances the score test is the most powerful test. However, our results showed that LRT has the compelling qualities of being generally more robust and more powerful under weight misspecification. This is a paramount result, given that, arguably, misspecified models are likely to be the rule rather than the exception in the weighting-based approaches.