An Invariants-based Method for Efficient Identification of Hybrid Species From Large-scale Genomic Data

An Invariants-based Method for Efficient Identification of Hybrid Species From Large-scale Genomic Data

Laura Kubatko, Julia Chifman

Independent evolution of ab- and adaxial stomatal density enables adaptation

Independent evolution of ab- and adaxial stomatal density enables adaptation

Christopher David Muir, Miquel Àngel Conesa, Jeroni Galmés

Evidence of adoption, monozygotic twinning, and low inbreeding rates in a large genetic pedigree of polar bears

Evidence of adoption, monozygotic twinning, and low inbreeding rates in a large genetic pedigree of polar bears

René M. Malenfant, David W. Coltman, Evan S. Richardson, Nicholas J. Lunn, Ian Stirling, Elizabeth Adamowicz, Corey S. Davis

An evolutionary hourglass of herbivore-induced transcriptomic responses in Nicotiana attenuata

An evolutionary hourglass of herbivore-induced transcriptomic responses in Nicotiana attenuata

Matthew Durrant, Justin Boyer, Ian T. Boldwin, Shuqing Xu

What kind of maternal effects are selected for in fluctuating environments?

What kind of maternal effects are selected for in fluctuating environments?

Stephen R Proulx, Henrique Teotonio

BrowseVCF: a web-based application and workflow to quickly prioritise disease-causative variants in VCF files.

BrowseVCF: a web-based application and workflow to quickly prioritise disease-causative variants in VCF files.

Silvia Salatino, Varun Ramraj, Stefano Lise

Ancestral genome reconstruction reveals the history of ecological diversification in Agrobacterium.

Ancestral genome reconstruction reveals the history of ecological diversification in Agrobacterium.

Florent Lassalle, Remi Planel, Simon Penel, David Chapulliot, Valerie Barbe, Audrey Dubost, Alexandra Calteau, David Vallenet, Damien Mornico, Laurent Gueguen, Ludovic Vial, Daniel Muller, Vincent Daubin, Xavier Nesme

Efficient Bayesian species tree inference under the multi-species coalescent

Efficient Bayesian species tree inference under the multi-species coalescent
Bruce Rannala, Ziheng Yang

A method was developed for Bayesian inference of species phylogeny using the multi-species coalescent model. To improve the mixing properties of the Markov chain Monte Carlo (MCMC) algorithm that traverses the space of species trees, we implement two efficient MCMC proposals: the first is based on the Subtree Pruning and Regrafting (SPR) algorithm and the second is based on a novel node-slider algorithm. Like the Nearest-Neighbor Interchange (NNI) algorithm we implemented previously, both algorithms propose changes to the species tree, while simultaneously altering the gene trees at multiple genetic loci to automatically avoid conflicts with the newly-proposed species tree. The method integrates over gene trees, naturally taking account of the uncertainty of gene tree topology and branch lengths given the sequence data. A simulation study was performed to examine the statistical properties of the new method. We found that it has excellent statistical performance, inferring the correct species tree with near certainty when analyzing 10 loci. The prior on species trees has some impact, particularly for small numbers of loci. An empirical dataset (for rattlesnakes) was reanalyzed. While the 18 nuclear loci and one mitochondrial locus support largely consistent species trees under the multi-species coalescent model estimates of parameters suggest drastically different evolutionary dynamics between the nuclear and mitochondrial loci.

The non-equilibrium allele frequency spectrum in a Poisson random field framework

The non-equilibrium allele frequency spectrum in a Poisson random field framework
Ingemar Kaj, Carina F. Mugal

In population genetic studies, the allele frequency spectrum (AFS) efficiently summarizes genome-wide polymorphism data and shapes a variety of allele frequency-based summary statistics. While existing theory typically features equilibrium conditions, emerging methodology requires an analytical understanding of the build-up of the allele frequencies over time. In this work, we use the framework of Poisson random fields to derive new representations of the non-equilibrium AFS for the case of a Wright-Fisher population model with selection. In our approach, the AFS is a scaling-limit of the expectation of a Poisson stochastic integral and the representation of the non-equilibrium AFS arises in terms of a fixation time probability distribution. The known duality between the Wright-Fisher diffusion process and a birth and death process generalizing Kingman’s coalescent yields an additional representation. The results carry over to the setting of a random sample drawn from the population and provide the non-equilibrium behavior of sample statistics. Our findings are consistent with and extend a previous approach where the non-equilibrium AFS solves a partial differential forward equation with a non-traditional boundary condition. Moreover, we provide a bridge to previous coalescent-based work, and hence tie several frameworks together. Since frequency-based summary statistics are widely used in population genetics, for example, to identify candidate loci of adaptive evolution, to infer the demographic history of a population, or to improve our understanding of the underlying mechanics of speciation events, the presented results are potentially useful for a broad range of topics.

Score distributions of gapped multiple sequence alignments down to the low-probability tail

Score distributions of gapped multiple sequence alignments down to the low-probability tail
Pascal Fieth, Alexander K. Hartmann

Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to the for practical applications in Molecular Biology much more relevant case of multiple sequence alignments with gaps. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10^-160, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments.