Shai Carmi, Pier Francesco Palamara, Vladimir Vacic, Todd Lencz, Ariel Darvasi, Itsik Pe’er
(Submitted on 21 Jun 2012)
Widespread sharing of long, identical-by-descent (IBD) genetic segments is a hallmark of populations that have experienced a recent bottleneck. The detection of these IBD segments is now feasible, enabling a wide range of applications from phasing and imputation to demographic inference. Here, we study the distribution of IBD sharing in the Wright-Fisher model. Using coalescent theory, we calculate the mean and variance of the total sharing between arbitrary pairs of individuals. We then study the cohort-averaged sharing: the average total sharing between one individual to the rest of the cohort. We find that for large cohorts, the cohort-averaged sharing is distributed approximately normally. Surprisingly, the variance of this distribution remains large even for large cohorts, implying the existence of “hyper-sharing” individuals. The presence of such individuals bears important consequences to the design of sequencing studies, since, if they are selected for whole-genome sequencing, a larger fraction of the cohort can be subsequently imputed. We calculate the expected gain in power of imputation by IBD, and subsequently, in power to detect an association, when individuals are either randomly selected or are specifically the hyper-sharing individuals. Finally, we study the distribution of pairwise sharing and cohort-averaged sharing in the Ashkenazi Jewish population.