Joshua G. Schraiber, Stephannie Shih, Montgomery Slatkin
(Submitted on 26 Sep 2012)
We examine the distribution of heterozygous sites in nine European and nine Yoruban individuals whose genomic sequences were made publicly available by Complete Genomics. We show that it is possible to obtain detailed information about inbreeding when a relatively small set of whole-genome sequences is available. Rather than focus on testing for deviations from Hardy-Weinberg genotype frequencies at each site, we analyze the entire distribution of heterozygotes conditioned on the number of copies of the derived (non-chimpanzee) allele. Using Levene’s exact test, we reject Hardy-Weinberg in both populations. We generalized Levene’s distribution to obtain the exact distribution of the number of heterozygous individuals given that every individual has the same inbreeding coefficient, F. We estimated F to be 0.0026 in Europeans and 0.0005 in Yorubans, but we could also reject the hypothesis that F was the same in each individual. We used a composite likelihood method to estimate F in each individual and within each chromosome. Variation in F across chromosomes within individuals was too large to be consistent with sampling effects alone. Furthermore, estimates of F for each chromosome in different populations were not correlated. Our results show how detailed comparisons of population genomic data can be made to theoretical predictions. The application of methods to the Complete Genomics data set shows that the extent of apparent inbreeding varies across chromosomes and across individuals, and estimates of inbreeding coefficients are subject to unexpected levels of variation which might be partly accounted for by selection.