Unbiased statistical testing of shared genetic control for potentially related traits

Unbiased statistical testing of shared genetic control for potentially related traits
Chris Wallace
(Submitted on 23 Jan 2013)

Integration of data from genomewide single nucleotide polymorphism (SNP) association studies of different traits should allow researchers to disentangle the genetics of potentially related traits within individually associated regions. Methods have ranged from visual comparison of association $p$ values for each trait to formal statistical colocalisation testing of individual regions, which requires selection of a set of SNPs summarizing the association in a region. We show that the SNP selection method greatly affects type 1 error rates, with all published studies to date having used SNP selection methods that result in substantially biased inference. The primary reasons are twofold: random variation in the prescence of linkage disequilibrium means selected SNPs do not fully capture the association signal, and selecting SNPs on the basis of significance leads to biased effect size estimates.
We show that unbiased inference can be made either by avoiding variable selection and instead testing the most informative principal components or by integrating over variable selection using Bayesian model averaging. Application to data from Graves’ disease and Hashimoto’s thyroiditis reveals a common genetic signature across seven regions shared between the diseases, and indicates that for five out of six regions which have been significantly associated with one disease and not the other, the lack of evidence in one disease represents genuine absence of association rather than lack of power.

3 thoughts on “Unbiased statistical testing of shared genetic control for potentially related traits

  1. This is my first paper on arXiv, and it’s nice to see it picked up here, thanks. Below is a little commentary from our group’s blog which may add a little more rounded impression of the paper than is possible in an abstract.

    We have a new paper on arXiv detailing some work on colocalisation analysis, a method to determine whether two traits share a common causal variant. This is of interest in autoimmune disease genetics as the associated loci of so many autoimmune diseases overlap, but, for some genes, it appears the causal variants are distinct. It is also relevant for integrating disease association and eQTL data, to understand whether association of a disease to a particular locus is mediated by a variant’s effect on expression of a specific gene, possibly in a specific tissue.

    Determining whether traits share a common causal variant as opposed to distinct causal variants, probably in some LD, is not straightforward. It is well established that regression coefficients are aymptotically unbiased. However, when a SNP has been selected because it is the most associated in a region, then coefficients do then tend to be biased away from the null, ie their effect is overestimated. Because SNPs need to be selected to describe the association in any region in order to do colocalisation analysis, and because the coefficient bias will differ between datasets, there could be a tendancy to call truly colocalising traits as distinct. In fact, application of a formal statistical test for colocalisation in a naive manner could have a type 1 error rate around 10-20% for a nominal size of 5%. This of course suggests that our earlier analysis of type 1 diabetes and monocyte gene expression needs to be revised because it is likely we will have falsely rejected some genes which mediate the type 1 diabetes association in a region.

    In this paper, we demonstrate two methods to overcome the problem. One, possibly more attractive to frequentists, is to avoid the variable selection by performing the analysis on principle components which summarise the genetic variation in a region. There is an issue with how many components are required, and our simulations suggest enough components need to be selected to capture around 85% of variation in a region. Obviously, this leads to a huge increase in degrees of freedom but, surprisingly, the power was not much worse compared to our favoured option of averaging p values over the variable selection using Bayesian Model Averaging. The idea of averaging p values is possibly anathema to Bayesians and frequentists alike, but these “posterior predictive p values” do have some history, having been introduced by Rubin in 1984. If you are prepared to mix Bayesian and frequentist theory sufficiently to average a p value over a posterior distribution (in this case, the posterior is of the SNPs which jointly summarise the association to both traits), it’s quite a nice idea. We used it before as an alternative to taking a profile likelihood approach to dealing with a nuisance parameter, instead calculating p values conditional on the nuisance parameter, and averaging over its posterior. In this paper, we show by simulation that it does a good job of maintaining type 1 error and tends to be more powerful than the principle components approach.

    There are many questions regarding integration of data from different GWAS that this paper doesn’t address: how to do this on a genomewide basis, for multiple traits, or when samples are not independent (GWAS which share a common set of controls, for example). Thus, it is a small step, but a useful contribution, I think, demonstrating a statistically sound method of investigating potentially shared causal variants in individual loci in detail. And while detailed investigation of individual loci may be currently be less fashionable than genomewide analyses, those detailed analyses remain crucial for fine resolution analysis of key loci.

  2. Pingback: Our paper: Unbiased statistical testing of shared genetic control for potentially related traits | Haldane's Sieve

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s