One-rate models outperform two-rate models in site-specific dN/dS estimation
Methods that infer site-specific dN/dS, the ratio of nonsynonymous to synonymous substi- tution rates, from coding data have been developed primarily to identify positively selected sites (dN/dS > 1). As a consequence, it is largely unknown how well different inference methods can infer dN/dS point estimates at individual sites. In particular, dN/dS may be estimated using either a one-rate approach, where dN/dS is parameterized as a single parameter, or a two-rate approach, in which dN and dS are estimated separately. While some have suggested that the two-rate paradigm may be preferred for positive-selection inference, the relative merits of these two paradigms for site-specific dN/dS estimation remain largely untested. Here, we systematically assess how accurately several popular inference frameworks infer site-specific dN/dS values using alignments simulated within a mutation-selection framework rather than within a dN/dS-based framework. As mutation-selection models describe long-term evolutionary constraints, our simulation approach further allows us to study under what conditions inferred dN/dS captures the underlying equilibrium evolutionary process. We find that one-rate inference models universally outperform two-rate models. Surprisingly, we recover this result even for data simulated with codon bias (i.e., dS varies among sites). Therefore, even when extensive dS variation exists, modeling this variation substantially reduces accuracy. We additionally find that high levels of divergence among sequences, rather than the number of sequences in the alignment, are more critical for obtaining precise point estimates. We conclude that inference methods which model dN/dS with a single parameter are the preferred choice for estimating reliable site-specific dN/dS ratios.