Statistical properties of the site-frequency spectrum associated with Lambda-coalescents

Matthias Birkner, Jochen Blath, Bjarki Eldon

(Submitted on 26 May 2013)

Statistical properties of the site frequency spectrum associated with Lambda-coalescents are our objects of study. In particular, we derive recursions for the expected value, variance, and covariance of the spectrum, extending earlier results of Fu (1995) for the classical Kingman coalescent. Our focus is on estimating coalescent parameters introduced by certain Lambda-coalescents for datasets to large for full likelihood methods. The recursions for the expected values we obtain can be used to find the parameter values which give the best fit to the observed frequency spectrum. The expected values are also used to approximate the probability a (derived) mutation arises on a branch subtending a given number of leaves (DNA sequences), allowing us to apply a pseudo-likelihood inference to estimate coalescence parameters associated with certain subclasses of Lambda coalescents. The properties of the pseudo-likelihood approach are investigated on real and simulated datasets. Our results for two subclasses of Lambda coalescents show that one can distinguish these subclasses from the Kingman coalescent, as well as between the Lambda-subclasses. In addition, our results yield further support for multiple merger coalescents as an appropriate `null’ model at the mitochondrial DNA level for high-fecundity Atlantic cod (\emph{Gadus morhua}).