This post is by Sriram Sankararaman, Nick Patterson, Heng Li, Svante Pääbo, and David Reich on their paper The date of interbreeding between Neandertals and modern humans arXived here
The relationship between modern humans and archaic hominins such as Neandertals has been the subject of intense debate. The sequencing of a Neandertal genome, a couple of years back (Green et al, Science 2010), showed that Neandertals are more closely related to non-African genomes than African genomes. One possible model consistent with this observation is one involving gene flow from Neandertals to modern non-Africans after the divergence of African and non-African populations. Another model that can explain these observations is one in which the population ancestral to modern humans and Neandertals is structured e.g. imagine that the population ancestral to Neandertals and modern humans consists of three groups, A,B and C, where A,B and C represent the ancestors of modern Africans, non-Africans and Neandertals respectively. The extra proximity of Neandertals to non-Africans over Africans could occur if A and B, and B and C exchanged genes with each other followed by C diverging to form Neandertals, and A and B not completely hybridizing before their divergence to form Africans and non-Africans.
The Neandertal (Green et al, Science 2010) and the Denisova genome (Reich et al, Nature 2010) papers considered the possibility of both models — either scenario was shown to produce the skew in the observed D-statistics (a measure of the excess sharing of alleles across groups) that led to Neandertals appearing closer to non-Africans than Africans. Indeed, a recent paper by Eriksson and Manica (Eriksson and Manica, PNAS 2012) used an Approximate Bayesian Computation framework with D-statistics as the summary statistics and arrived at similar conclusions.
A paper from Monty Slatkin’s group (Yang et al, MBE 2012) attempted to differentiate the two scenarios by using the site frequency spectrum. Yang et al considered the site frequency spectrum in Europeans conditioned on observing a derived allele in Neandertal and an ancestral allele in Africans (termed the doubly-conditioned frequency spectrum, dcfs). They used theory and simulations to show that an ancient structure model produces a linear dcfs. On the other hand, they showed that recent gene flow can produce an excess of rare variants which matches the observed dcfs. Interestingly, they also observed that bottlenecks post gene flow had the effect of making the dcfs linear suggesting that gene flow from Neandertals could not have preceded strong bottlenecks in the non-African populations.
A different idea that we explored was to ask if patterns of linkage disequilibrium (LD) might discriminate the two scenarios. If we could pick out haplotypes that came into modern humans from Neandertal, recombination is expected to break these haplotypes down at a fixed rate every generation (assuming neutrality). Haplotypes that came in 1000 generations ago (under recent gene flow) should be expected to be 10 times longer on average than haplotypes that came in 10000 generations ago (under ancient structure). And if we could measure LD precisely enough, we could even date these ancient events. To date such ancient events, we had to address two technical challenges : i) measures of LD can be sensitive to demographic events, ii) for events that occurred 1000s of generations ago, we need to measure LD at size scales at which genetic maps can be quite noisy and this noise can bias estimates of dates.
Theory indicates that the expected LD (measured by Lewontin’s D), across SNPs that arose on the Nenadertal lineage and introgressed, decays exponentially with genetic distance at a rate given by the time of gene flow and is robust to demographic events. This result does not hold in practice due to imperfect ascertainment of these SNPs. We did simulations to show that this decay of LD does provide accurate estimates and can differentiate gene flow and ancient structure. We also came up with a model to assess errors in genetic maps which we then used to obtain a corrected date.
Our results support the recent gene flow scenario with a likely date of gene flow into the ancestors of modern Europeans 37000-86000 years BP although this does not exclude the possibility of ancient structure. A broader methodological question we are exploring is whether LD-based analyses might be generally applicable as a tool for dating other ancient gene flow events.
Sriram Sankararaman, Nick Patterson, Heng Li, Svante Pääbo, and David Reich