Victor Hanson-Smith, Christopher Baker, Alexander Johnson
(Submitted on 11 Jun 2014)
A central challenge in the study of protein evolution is the identification of historic amino acid sequence changes responsible for creating novel functions observed in present-day proteins. To address this problem, we developed a new method to identify and rank amino acid mutations in ancestral protein sequences according to their function-shifting potential. Our approach scans the changes between two reconstructed ancestral sequences in order to find (1) sites with sequence changes that significantly deviate from our model-based probabilistic expectations, (2) sites that demonstrate extreme changes in mutual information, and (3) sites with extreme gains or losses of information content. By taking the overlaps of these statistical signals, the method accurately identifies cryptic evolutionary patterns that are often not obvious when examining only the conservation of modern-day protein sequences. We validated this method with a training set of previously-discovered function-shifting mutations in three essential protein families in animals and fungi, whose evolutionary histories were the prior subject of systematic molecular biological investigation. Our method identified the known function-shifting mutations in the training set with a very low rate of false positive discovery. Further, our approach significantly outperformed other methods that use variability in evolutionary rates to detect functional loci. The accuracy of our approach indicates it could be a useful tool for generating specific testable hypotheses regarding the acquisition of new functions across a wide range of protein families.