Footprints of ancient balanced polymorphisms in genetic variation data
Ziyue Gao, Molly Przeworski, Guy Sella
(Submitted on 29 Jan 2014)
When long-lived, balancing selection can lead to trans-species polymorphisms that are shared by two or more species identical by descent. In this case, the gene genealogies at the selected sites cluster by allele instead of by species and, because of linkage, nearby neutral sites also have unusual genealogies. Although it is clear that this scenario should lead to discernible footprints in genetic variation data, notably the presence of additional neutral polymorphisms shared between species and the absence of fixed differences, the effects remain poorly characterized. We focus on the case of a single site under long-lived balancing selection and derive approximations for summaries of the data that are sensitive to a trans-species polymorphism: the length of the segment that carries most of the signals, the expected number of shared neutral SNPs within the segment and the patterns of allelic associations among them. Coalescent simulations of ancient balancing selection confirm the accuracy of our approximations. We further show that for humans and chimpanzees, and more generally for pairs of species with low genetic diversity levels, the patterns of genetic variation on which we focus are highly unlikely to be generated by neutral recurrent mutations, so these statistics are specific as well as sensitive. We discuss the implications of our results for the design and interpretation of genome scans for ancient balancing selection in apes and other taxa.