Dimensionality and the statistical power of multivariate genome-wide association studies

Eladio J. Marquez , David Houle

doi: http://dx.doi.org/10.1101/016592

Mutations virtually always have pleiotropic effects, yet most genome-wide association studies (GWAS) analyze effects one trait at a time. In order to investigate the performance of a multivariate approach to GWAS, we simulated scenarios where variation in a d-dimensional phenotype space was caused by a known subset of SNPs. Multivariate analyses of variance were then carried out on k traits, where k could be less than, greater than or equal to d. Our results show that power is maximized and false discovery rate (FDR) minimized when the number of traits analyzed, k, matches the true dimensionality of the phenotype being analyzed, d. When true dimensionality is high, the power of a single univariate analysis can be an order of magnitude less than the k=d case, even when the single trait with the largest genetic variance is chosen for analysis. When traits are added to a study in order of their independent genetic variation, the gains in power from increasing k up to d are much larger than the loss in power when k exceeds d. Simulations that explicitly model linkage disequilibrium (LD) indicate that when SNPs in disequilibrium are subjected to multivariate analysis, the magnitude of the apparent effect induced onto null SNPs by SNPs carrying a true effect weakens as k approaches d, such that the rank of P-values among a set of correlated SNPs becomes an increasingly reliable predictor of true positives. Multivariate GWAS outperform univariate ones under a wide range of conditions, and should become the standard in studies of the inheritance of complex phenotypes.