Inference of super-exponential human population growth via efficient computation of the site frequency spectrum for generalized models
Feng Gao, Alon Keinan
The site frequency spectrum (SFS) and other genetic summary statistics are at the heart of many population genetics studies. Previous studies have shown that human populations had undergone a recent epoch of fast growth in effective population size. These studies assumed that growth is exponential, and the ensuing models leave unexplained excess amount of extremely rare variants. This suggests that human populations might have experienced a recent growth with speed faster than exponential. Recent studies have introduced a generalized growth model where the growth speed can be faster or slower than exponential. However, only simulation approaches were available for obtaining summary statistics under such models. In this study, we provide expressions to accurately and efficiently evaluate the SFS and other summary statistics under generalized models, which we further implement in a publicly available software. Investigating the power to infer deviation of growth from being exponential, we observed that decent sample sizes facilitate accurate inference, e.g. a sample of 3000 individuals with the amount of data expected from exome sequencing allows observing and accurately estimating growth with speed deviating by 10% or more from that of exponential. Applying our inference framework to data from the NHLBI Exome Sequencing Project, we found that a model with a generalized growth epoch fits the observed SFS significantly better than the equivalent model with exponential growth (p-value = 3.85 × 10-6). The estimated growth speed significantly deviates from exponential (p-value << 10-12), with the best-fit estimate being of growth speed 12% faster than exponential.