Using Volcano Plots and Regularized-Chi Statistics in Genetic Association Studies
Wentian Li, Jan Freudenberg, Young Ju Suh, Yaning Yang
(Submitted on 28 Aug 2013)
Labor intensive experiments are typically required to identify the causal disease variants from a list of disease associated variants in the genome. For designing such experiments, candidate variants are ranked by their strength of genetic association with the disease. However, the two commonly used measures of genetic association, the odds-ratio (OR) and p-value, may rank variants in different order. To integrate these two measures into a single analysis, here we transfer the volcano plot methodology from gene expression analysis to genetic association studies. In its original setting, volcano plots are scatter plots of fold-change and t-test statistic (or -log of the p-value), with the latter being more sensitive to sample size. In genetic association studies, the OR and Pearson’s chi-square statistic (or equivalently its square root, chi; or the standardized log(OR)) can be analogously used in a volcano plot, allowing for their visual inspection. Moreover, the geometric interpretation of these plots leads to an intuitive method for filtering results by a combination of both OR and chi-square statistic, which we term “regularized-chi”. This method selects associated markers by a smooth curve in the volcano plot instead of the right-angled lines which corresponds to independent cutoffs for OR and chi-square statistic. The regularized-chi incorporates relatively more signals from variants with lower minor-allele-frequencies than chi-square test statistic. As rare variants tend to have stronger functional effects, regularized-chi is better suited to the task of prioritization of candidate genes.