A simple, general result for the variance of substitution number in molecular evolution

Bahram Houchmandzadeh, Marcel Vallade

(Submitted on 16 Feb 2016)

The number of substitutions (of nucleotides, amino acids, …) that take place during the evolution of a sequence is a stochastic variable of fundamental importance in the field of molecular evolution. Although the mean number of substitutions during molecular evolution of a sequence can be estimated for a given substitution model, no simple solution exists for the variance of this random variable. We show in this article that the computation of the variance is as simple as that of the mean number of substitutions for both short and long times. Apart from its fundamental importance, this result can be used to investigate the dispersion index R , i.e. the ratio of the variance to the mean substitution number, which is of prime importance in the neutral theory of molecular evolution. By investigating large classes of substitution models, we demonstrate that although R\ge1 , to obtain R significantly larger than unity necessitates in general additional hypotheses on the structure of the substitution model.

