The Population Genetic Signature of Polygenic Local Adaptation
Jeremy J. Berg, Graham Coop
(Submitted on 29 Jul 2013)
Adaptation in response to selection on polygenic phenotypes occurs via subtle allele frequencies shifts at many loci. Current population genomic techniques are not well posed to identify such signals. In the past decade, detailed knowledge about the specific loci underlying polygenic traits has begun to emerge from genome-wide association studies (GWAS). Here we combine this knowledge from GWAS with robust population genetic modeling to identify traits that have undergone local adaptation. Using GWAS data, we estimate the mean additive genetic value for a give phenotype across many populations as simple weighted sums of allele frequencies. We model the expected differentiation of GWAS loci among populations under neutrality to develop simple tests of selection across an arbitrary number of populations with arbitrary population structure. To find support for the role of specific environmental variables in local adaptation we test for correlations with the estimated genetic values. We also develop a general test of local adaptation to identify overdispersion of the estimated genetic values values among populations. This test is a natural generalization of QST /FST comparisons based on GWAS predictions. Finally we lay out a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion. These tests have considerably greater power than their single locus equivalents due to the fact that they look for positive covariance between like effect alleles. We apply our tests to the human genome diversity panel dataset using GWAS data for six different traits. This analysis uncovers a number of putative signals of local adaptation, and we discuss the biological interpretation and caveats of these results.
Having had some prior exposure to this work from conversations with Graham and Jeremy and after skimming the article, one thing comes to my mind. Is there any way to involve the actual measured phenotypes in this kind of analysis? It seems like the situation in a lot of cases will be that some of the genetic basis of a trait is known but a large fraction is unknown—similar to the skin pigmentation case shown here. We already know from comparative phylogenetics that looking for selection solely on the basis of phenotypic measurement is possible and this paper shows that when we understand the genetic basis of a trait we can do it too. Perhaps some way of leveraging both types of data together could result in a robust and powerful test for selection.
I’m also a bit worried about the fact that the variants detected in a GWAS are almost certainly not THE causal variants. In some sense, I feel that the “genetic correlation” thing in Fig 1 gets at this problem and I think it’s unlikely that it would produce false positives, but I wonder if you have done any more direct simulations with recombination.
The authors make that clear in the assumptions in saying that “For the majority of GWAS associations we do not know the causal variant(s) at a locus, but rather
a SNP that is in linkage disequilibrium (LD) with the causal variant(s).”
Yeah, it occurred to me at some point that if one was willing to do Qst/Fst among populations without hierarchical structure, using estimates from phenotypic data, then you could easily just estimate the correlation structure (the F matrix in our notation) instead of F_ST, and just get the scaling term from standard quant gen methods. Ovaskainen and colleagues (a paper I just recently became aware of) actually worked through some of this a few years ago in a multi-trait case, although they don’t seem to have noted that the distribution they were simulating from in figure 3 was actually -1*chi^2.
We probably could/should do a bit more to reinforce this point though, as you’ll always due better by taking the hierarchy of relationships into account rather than just averaging over them (as standard Qst/Fst analyses of multiple populations would).
It also occurred to me just in the last few days that maybe one could learn more by analyzing both real phenotypic data and the estimated genetic values jointly, as you suggest, but I haven’t sat down and worked through exactly what one gains from that. Probably this isn’t feasible for humans, because we don’t have control over the environment, and thus it’d be hard to say for certain that disagreements between the two datasets weren’t just environmental, but in systems were we can manipulate individuals it may be handy.
With regard to the GWAS SNPs not being causal: We’ve not done any explicit simulations with recombination, but because all we’re doing is rejecting a neutral model of drift, there doesn’t seem to be any way this should result in false positives. I think your intuition is right that figure 1 addresses this at least conceptually (although obviously not explicitly in the context of recombination).
I suppose you might imagine that it adds to the interpretation headache if the scale of LD is shorter in some populations than others, as you might be able to see selection in some populations (e.g. non-Africans), but not others (e.g. Africans). Again, it shouldn’t cause false positives, but you might imagine getting the story somewhat wrong.
Thanks for your comments. Just to followup on Jeremy’s comments. If we really knew the additive genetic value of individuals for a phenotype then we could do better than we are doing (likely just by using a variant of QST). The problem is that for most species we can’t do common garden experiments, nor the crosses needed to assess the the additive variance. I think these assumptions often get swept under the rug in phylogenetic studies [although obviously the better analyses are aware of that fact].
Pingback: What we’re reading: mutational bias, local adaptation, insecticide resistance, and CC-BY licensing | The Molecular Ecologist
Jeremy and Graham,
That’s a good point that the different variance components in certain populations make it difficult to untangle the effects of environment vs. genes. I think that this is especially important at the shorter time scales considered in this kind of analysis vs. a more macroevolutionary analysis. I have some intuition that knowing a partial genetic basis for the trait could help to disentangle these things if you are willing to make some simplifying assumptions about the nature of the environmental effects (along the lines of a mixed effects model). But that intuition may well be wrong.
Also, that Ovaskainen paper is quite neat. I shall take a look at it.
Pingback: Selection everlasting, suppositions no more - Gene Expression | DiscoverMagazine.com
Pingback: Selection everlasting, suppositions no more | Biology News by Biologged
Very interesting work in the Berg and Coop paper! As pointed out in some of the comments above, some of the work comes close to our paper in Genetics in 2011 (http://www.genetics.org/content/189/2/621). Though we don’t have a GWAS point of view but use (neutral) genetic markers to figure out the relatedness matrix (at the population level), as explained in more detail in another Genetics paper by Markku Karhunen and myself in 2012 (http://www.genetics.org/content/192/2/609).
In his comment above Jeremy is correct that in our 2011 paper we simulate a distribution that could be computed analytically, we have actually fixed this in a software paper (which implements our method as a R-package) by Markku Karhunen et al. in Molecular Ecology Resources 2013 (http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12111/abstract).
Another interesting connection / convergent line of thinking is that in a forthcoming paper by Markku Karhunen et al. (Evolution, in press), we have a very similar approach as Berg and Coop to look at environmental correlations.
Thanks for your comments. We somehow didn’t stumble on your 2011 paper until a few days before we submitted when Peter Fields pointed it out to us, but it was nice to see you guys have been advancing this stuff in a more traditional quantitative genetics setting as well. I’ve been thinking about some of our stuff in a G matrix context (e.g. GWAS for multiple traits with partially overlapping genetic bases), and the first thing that stood out to me in getting acquainted with a lot of that literature is the lack of explicit information about the relatedness matrix, so it’s great to see you guys pushing that forward.
Pingback: Most viewed on Haldane’s Sieve: August 2013 | Haldane's Sieve
Pingback: Author Post: The Population Genetic Signature of Polygenic Local Adaptation | Haldane's Sieve
Pingback: Most viewed on Haldane’s Sieve: September 2013 | Haldane's Sieve
Pingback: Sifting through 2013 with Haldane’s Sieve | Haldane's Sieve
Pingback: Some thoughts on our polygenic selection paper. | gcbias