Distribution of gene tree histories under the coalescent model with gene flow
Yuan Tian, Laura Kubatko
We propose a coalescent model for three species that allows gene flow between both pairs of sister populations. The model is designed to analyze multilocus genomic sequence alignments, with one sequence sampled from each of the three species. The model is formulated using a Markov chain representation, which allows use of matrix exponentiation to compute analytical expressions for the probability density of gene tree genealogies. The gene tree history distribution as well as the gene tree topology distribution under this coalescent model with gene flow are then calculated via numerical integration. We analyze the model to compare the distributions of gene tree topologies and gene tree histories for species trees with differing effective population sizes and gene flow rates. Our results suggest conditions under which the species tree and associated parameters are not identifiable from the gene tree topology distribution when gene flow is present, but indicate that the gene tree history distribution may identify the species tree and associated parameters. Thus, the gene tree history distribution can be used to infer parameters such as the ancestral effective population sizes and the rates of gene flow in a maximum likelihood (ML) framework. We conduct computer simulations to evaluate the performance of our method in estimating these parameters, and we apply our method to an Afrotropical mosquito data set (Fontaine et al., 2015) to demonstrate the usefulness of our method for the analysis of empirical data. Key words: coalescent, gene flow, migration, hybridization, gene tree, topology, history, maximum likelihood, speciation.