Interrogating conserved elements of diseases using Boolean combinations of orthologous phenotypes
John O Woods , Matthew Z Tien , Edward M Marcotte
Conserved genetic programs often predate the homologous structures and phenotypes to which they give rise; eyes, for example, have evolved several dozen times, but their development seems to involve a common set of conserved genes. Recently, the concept of orthologous phenotypes (or phenologs) offered a quantitative way to describe this property. Phenologs are phenotypes or diseases from separate species who share an unexpectedly large set of their associated gene orthologs. It has been shown that the phenotype pairs which make up a phenolog are mutually predictive in terms of the genes involved. Recently, we demonstrated the ranking of gene–phenotype association predictions using multiple phenologs from an array of species. In this work, we demonstrate a computational method which provides a more targeted view of the conserved pathways which give rise to diseases. Our approach involves the generation of synthetic pseudo-phenotypes made up of Boolean combinations (union, intersection, and difference) of the gene sets for phenotypes from our database. We search for diseases that overlap significantly with these Boolean phenotypes, and find a number of highly predictive combinations. While set unions produce less specific predictions (as expected), intersection and difference-based combinations appear to offer insights into extremely specific aspects of target diseases. For example, breast cancer is predicted by zebrafish methylmercury response minus metal ion response, with predictions MT-COI, JUN, SOD2, GADD45B, and BAX all involved in the pro-apoptotic response to reactive oxygen species, thought to be a key player in cancer. We also demonstrate predictions from Arabidopsis Boolean phenotypes for increased brown adipose tissue in mouse (salt stress response’s intersection with sucrose stimulus response); and for human myopathy (red light response minus water deprivation response). We demonstrate the ranking of predictions for human holoprosencephaly from the set intersections between each pair of a variety of closely-related zebrafish phenotypes. Our results suggest that Boolean phenolog combinations may provide a more informed insight into the conserved pathways underlying diseases than either regular phenologs or the naïve Bayes approach.