Modeling DNA methylation dynamics with approaches from phylogenetics
John A. Capra, Dennis Kostka
(Submitted on 11 Apr 2014)
Methylation of CpG dinucleotides is a prevalent epigenetic modification that is required for proper development in vertebrates, and changes in CpG methylation are essential to cellular differentiation. Genome-wide DNA methylation assays have become increasingly common, and recently distinct stages across differentiating cellular lineages have been assayed. How- ever, current methods for modeling methylation dynamics do not account for the dependency structure between precursor and dependent cell types. We developed a continuous-time Markov chain approach, based on the observation that changes in methylation state over tissue differentiation can be modeled similarly to DNA nucleotide changes over evolutionary time. This model explicitly takes precursor to descendant relationships into account and enables inference of CpG methylation dynamics. To illustrate our method, we analyzed a high-resolution methylation map of the differentiation of mouse stem cells into several blood cell types. Our model can successfully infer unobserved CpG methylation states from observations at the same sites in related cell types (90% correct), and this approach more accurately reconstructs missing data than imputation based on neighboring CpGs (84% correct). Additionally, the single CpG resolution of our methylation dynamics estimates enabled us to show that DNA sequence context of CpG sites is informative about methylation dynamics across tissue differentiation. Finally, we identified genomic regions with clusters of highly dynamic CpGs and present a likely functional example. Our work establishes a framework for inference and modeling that is well-suited to DNA methylation data, and our success suggests that other methods for analyzing DNA nucleotide substitutions will also translate to the modeling of epigenetic phenomena.