Robert Thomson, David Plachetzki, Luke Mahler, Brian Moore
As progress toward a highly resolved tree of life continues to expose nodes that resist resolution, interest in new sources of phylogenetic information that are informative for these most difficult relationships continues to increase. One such potential source of information, the presence and absence of microRNA families, has been vigorously promoted as an ideal phylogenetic marker and has been recently deployed to resolve several long-standing phylogenetic questions. Understanding the utility of such markers for phylogenetic inference hinges on developing a better understanding for how such markers behave under suitable evolutionary models, as well as how they perform in real inference scenarios. However, as yet, no study has rigorously characterized the statistical behavior or utility of these markers. Here we examine the behavior and performance of microRNA presence/absence data under a variety of evolutionary models and reexamine datasets from several previous studies. We find that highly heterogeneous rates of microRNA gain and loss, pervasive secondary loss, and sampling error collectively render microRNA-based inference of phylogeny difficult, and fundamentally alter the conclusions for four of the five studies that we re-examine. Our results indicate that miRNA data have far less phylogenetic utility in resolving the tree of life than is currently recognized and we urge ample caution in their interpretation.