Hierarchical Bayesian model of population structure reveals convergent adaptation to high altitude in human populations
Matthieu Foll, Oscar E. Gaggiotti, Josephine T. Daub, Laurent Excoffier
(Submitted on 18 Feb 2014)
Detecting genes involved in local adaptation is challenging and of fundamental importance in evolutionary, quantitative, and medical genetics. To this aim, a standard strategy is to perform genome scans in populations of different origins and environments, looking for genomic regions of high differentiation. Because shared population history or population sub-structure may lead to an excess of false positives, analyses are often done on multiple pairs of populations, which leads to i) a global loss of power as compared to a global analysis, and ii) the need for multiple tests corrections. In order to alleviate these problems, we introduce a new hierarchical Bayesian method to detect markers under selection that can deal with complex demographic histories, where sampled populations share part of their history. Simulations show that our approach is both more powerful and less prone to false positive loci than approaches based on separate analyses of pairs of populations or those ignoring existing complex structures. In addition, our method can identify selection occurring at different levels (i.e. population or region-specific adaptation), as well as convergent selection in different regions. We apply our approach to the analysis of a large SNP dataset from low- and high-altitude human populations from America and Asia. The simultaneous analysis of these two geographic areas allows us to identify several new candidate genome regions for altitudinal selection, and we show that convergent evolution among continents has been quite common. In addition to identifying several genes and biological processes involved in high altitude adaptation, we identify two specific biological pathways that could have evolved in both continents to counter toxic effects induced by hypoxia.
This seems like a fun paper, thanks for posting. Small Q after skimming: why take a sliding window approach in practice? It seems like this approach is most powerful if selection affects allele frequencies in a large (500kb) window, but if selection is polygenic (and the individual selected site falls on many haplotypes) it seems plausible that only the single selected site should change frequency. Naively I was expecting something like Figure 3 but with each point being a SNP rather than a region, is this just really noisy?
Fully agree: the sliding window approach is only here to detect strong signal affecting large regions of the genome, and clearly not to identify polygenic selection. And that’s why here we also re-identify the strongest candidate genes previously found (EPAS, EGLN1).
We wanted to have an integrative approach and to identify selection at different levels corresponding the three sub-sections in our results:
– At the SNP level (which is just the raw output from our method)
– At the gene level (using the sliding window)
– At the “polygenic level” (using GO, pathway and gene set enrichment). To be clear: here we again use the raw information from all SNPs and don’t filter using the sliding window.
I think the third one is certainly the one giving the most interesting and biologically meaningful results, but it doesn’t hurt to also show the sliding window result.
And to answer your question: yes, Figure 3 at the SNP level would be quite noisy, and I agree this is also pointing toward polygenic selection.
Got it, thanks!
Pingback: Author post: Hierarchical Bayesian model of population structure reveals convergent adaptation to high altitude in human populations | Haldane's Sieve
Pingback: Most viewed on Haldane’s Sieve: February 2014 | Haldane's Sieve