Speciation and introgression between Mimulus nasutus and Mimulus guttatus

Speciation and introgression between Mimulus nasutus and Mimulus guttatus
Yaniv Brandvain, Amanda M. Kenney, Lex Flagel, Graham Coop, Andrea L Sweigart
(Submitted on 26 Oct 2013)

Mimulus guttatus and M. nasutus are an evolutionary and ecological model sister species pair differentiated by ecology, mating system, and partial reproductive isolation. Despite extensive research on this system, the history of divergence and differentiation in this sister pair is unclear. We present and analyze a novel population genomic data set which shows that M. nasutus “budded” off of a central Californian M. guttatus population within the last 200 to 500 thousand years. In this time, the M. nasutus genome has accrued numerous genomic signatures of the transition to predominant selfing. Despite clear biological differentiation, we document ongoing, bidirectional introgression. We observe a negative relationship between the recombination rate and divergence between M. nasutus and sympatric M. guttatus samples, suggesting that selection acts against M. nasutus ancestry in M. guttatus.

Discovery of Phylogenetic Relevant Y-chromosome Variants in 1000 Genomes Project Data

Discovery of Phylogenetic Relevant Y-chromosome Variants in 1000 Genomes Project Data
Chuan-Chao Wang, Hui Li
(Submitted on 24 Oct 2013)

Current Y chromosome research is limited in the poor resolution of Y chromosome phylogenetic tree. Entirely sequenced Y chromosomes in numerous human individuals have only recently become available by the advent of next-generation sequencing technology. The 1000 Genomes Project has sequenced Y chromosomes from more than 1000 males. Here, we analyzed 1000 Genomes Project Y chromosome data of 1269 individuals and discovered about 25,000 phylogenetic relevant SNPs. Those new markers are useful in the phylogeny of Y chromosome and will lead to an increased phylogenetic resolution for many Y chromosome studies.

Stochastic dynamics of adaptive trait and neutral marker driven by eco-evolutionary feedbacks

Stochastic dynamics of adaptive trait and neutral marker driven by eco-evolutionary feedbacks
Sylvain Billiard (GEPV), Regis Ferriere (CNRS UMR 7625,), Sylvie Méléard (CMAP), Viet Chi Tran (LPP)
(Submitted on 23 Oct 2013)

How the neutral diversity is affected by selection and adaptation is investigated in an eco-evolutionary framework. In our model, we study a finite population in continuous time, where each individual is characterized by a trait under selection and a completely linked neutral marker. Population dynamics are driven by births and deaths, mutations at birth, and competition between individuals. Trait values influence ecological processes (demographic events, competition), and competition generates selection on trait variation, thus closing the eco-evolutionary feedback loop. The demographic effects of the trait are also expected to influence the generation and maintenance of neutral variation. We consider a large population limit with rare mutation, under the assumption that the neutral marker mutates faster than the trait under selection. We prove the convergence of the stochastic individual-based process to a new measure-valued diffusive process with jumps that we call Substitution Fleming-Viot Process (SFVP). When restricted to the trait space this process is the Trait Substitution Sequence first introduced by Metz et al. (1996). During the invasion of a favorable mutation, a genetical bottleneck occurs and the marker associated with this favorable mutant is hitchhiked. By rigorously analysing the hitchhiking effect and how the neutral diversity is restored afterwards, we obtain the condition for a time-scale separation; under this condition, we show that the marker distribution is approximated by a Fleming-Viot distribution between two trait substitutions. We discuss the implications of the SFVP for our understanding of the dynamics of neutral variation under eco-evolutionary feedbacks and illustrate the main phenomena with simulations. Our results highlight the joint importance of mutations, ecological parameters, and trait values in the restoration of neutral diversity after a selective sweep.

Cryptic Genetic Variation Can Make Irreducible Complexity a Common Mode of Adaptation

Cryptic Genetic Variation Can Make Irreducible Complexity a Common Mode of Adaptation
Meredith V. Trotter, Daniel B. Weissman, Grant I. Peterson, Kayla M. Peck, Joanna Masel
(Submitted on 22 Oct 2013)

The existence of complex (multiple-step) genetic adaptations that are “irreducible” (i.e., all partial combinations are less fit than the original genotype) is one of the longest standing problems in evolutionary biology. In standard genetics parlance, these adaptations require the crossing of a wide adaptive valley of deleterious intermediate stages. Here we demonstrate, using a simple model, that evolution can cross wide valleys to produce “irreducibly complex” adaptations by making use of previously cryptic mutations. When revealed by an evolutionary capacitor, previously cryptic mutants have higher initial frequencies than do new mutations, bringing them closer to a valley-crossing saddle in allele frequency space. Moreover, simple combinatorics imply an enormous number of candidate combinations exist within available cryptic genetic variation. We model the dynamics of crossing of a wide adaptive valley after a capacitance event using both numerical simulations and analytical approximations. Although individual valley crossing events become less likely as valleys widen, by taking the combinatorics of genotype space into account, we see that revealing cryptic variation can cause the frequent evolution of complex adaptations. This finding also effectively dismantles “irreducible complexity” as an argument against evolution by providing a general mechanism for crossing wide adaptive valleys.

Mutant epigenetic machinery mediates climate adaptation in Arabidopsis thaliana

Mutant epigenetic machinery mediates climate adaptation in Arabidopsis thaliana
Xia Shen, Simon Forsberg, Mats Pettersson, Zheya Sheng, Orjan Carlborg
(Submitted on 16 Oct 2013)

The genetic basis of adaptation to climate is largely unknown. We explored the genetic regulation of climate plasticity and its contribution to adaptation using publicly available data from two collections of natural Arabidopsis thaliana accessions from a wide range of habitats. Sixteen loci with plastic alleles were mapped and many of these contained candidate genes with amino acid changes. The Chromomethylase 2 (CMT2) genotype influenced adaptation to seasonal temperature variability and accessions carrying a mutant CMT2 allele disrupting the genome-wide CHH-methylation pattern displayed a more plastic response to climate. We conclude that genetic regulation of plasticity appears to be important for climate adaptation and that genetic variation in the epigenetic machinery, leading to altered genome-wide epigenetic modifications, is one of the underlying molecular mechanisms.

A latent factor model with a mixture of sparse and dense factors to model gene expression data with confounding effects

A latent factor model with a mixture of sparse and dense factors to model gene expression data with confounding effects
Chuan Gao, Christopher D Brown, Barbara E Engelhardt
(Submitted on 17 Oct 2013)

One important problem in genome science is to determine sets of co-regulated genes based on measurements of gene expression levels across samples, where the quantification of expression levels includes substantial technical and biological noise. To address this problem, we developed a Bayesian sparse latent factor model that uses a three parameter beta prior to flexibly model shrinkage in the loading matrix. By applying three layers of shrinkage to the loading matrix (global, factor-specific, and element-wise), this model has non-parametric properties in that it estimates the appropriate number of factors from the data. We added a two-component mixture to model each factor loading as being generated from either a sparse or a dense mixture component; this allows dense factors that capture confounding noise, and sparse factors that capture local gene interactions. We developed two statistics to quantify the stability of the recovered matrices for both sparse and dense matrices. We tested our model on simulated data and found that we successfully recovered the true latent structure as compared to related models. We applied our model to a large gene expression study and found that we recovered known covariates and small groups of co-regulated genes. We validated these gene subsets by testing for associations between genotype data and these latent factors, and we found a substantial number of biologically important genetic regulators for the recovered gene subsets.

General triallelic frequency spectrum under demographic models with variable population size

General triallelic frequency spectrum under demographic models with variable population size
Paul A. Jenkins, Jonas W. Mueller, Yun S. Song
(Submitted on 13 Oct 2013)

It is becoming routine to obtain datasets on DNA sequence variation across several thousands of chromosomes, providing unprecedented opportunity to infer the underlying biological and demographic forces. Such data make it vital to study summary statistics which offer enough compression to be tractable, while preserving a great deal of information. One well-studied summary is the site frequency spectrum—the empirical distribution, across segregating sites, of the sample frequency of the derived allele. However, most previous theoretical work has assumed that each site has experienced at most one mutation event in its genealogical history, which becomes less tenable for very large sample sizes. In this work we obtain, in closed-form, the predicted frequency spectrum of a site that has experienced at most two mutation events, under very general assumptions about the distribution of branch lengths in the underlying coalescent tree. Among other applications, we obtain the frequency spectrum of a triallelic site in a model of historically varying population size. We demonstrate the utility of our formulas in two settings: First, we show that triallelic sites are more sensitive to the parameters of a population that has experienced historical growth, suggesting that they will have use if they can be incorporated into demographic inference. Second, we investigate a recently proposed alternative mechanism of mutation in which the two derived alleles of a triallelic site are created simultaneously within a single individual, and we develop a test to determine whether it is responsible for the excess of triallelic sites in the human genome.

Non-identifiability of identity coefficients at biallelic loci

Non-identifiability of identity coefficients at biallelic loci
Miklós Csűrös
(Submitted on 13 Oct 2013)

Shared genealogies introduce allele dependencies in diploid genotypes, as alleles within an individual or between different individuals will likely match when they originate from a recent common ancestor. At a locus shared by a pair of diploid individuals, there are nine combinatorially distinct modes of identity-by-descent (IBD), capturing all possible combinations of coancestry and inbreeding. A distribution over the IBD modes is described by the nine associated probabilities, known as (Jacquard’s) identity coefficients. The genetic relatedness between two individuals can be succinctly characterized by the identity coefficients corresponding to the joint genealogy. The identity coefficients (together with allele frequencies) determine the distribution of joint genotypes at a locus. At a locus with two possible alleles, identity coefficients are not identifiable because different coefficients can generate the same genotype distribution.
We analyze precisely how different IBD modes combine into identical genotype distributions at diallelic loci. In particular, we describe IBD mode mixtures that result in identical genotype distributions at all allele frequencies, implying the non-identifiability of the identity coefficients from independent loci. Our analysis yields an exhaustive characterization of relatedness statistics that are always identifiable. Importantly, we show that identifiable relatedness statistics include the kinship coefficient (probability that a random pair of alleles are identical by descent between individuals) and inbreeding-related measures, which can thus be estimated from genotype distributions at independent loci.

forqs: Forward-in-time Simulation of Recombination, Quantitative Traits, and Selection

forqs: Forward-in-time Simulation of Recombination, Quantitative Traits, and Selection
Darren Kessner, John Novembre
(Submitted on 11 Oct 2013)

forqs is a forward-in-time simulation of recombination, quantitative traits, and selection. It was designed to investigate haplotype patterns resulting from scenarios where substantial evolutionary change has taken place in a small number of generations due to recombination and/or selection on polygenic quantitative traits. forqs is implemented as a command- line C++ program. Source code and binary executables for Linux, OSX, and Windows are freely available under a permissive BSD license.

The Fossilized Birth-Death Process: A Coherent Model of Fossil Calibration for Divergence Time Estimation

The Fossilized Birth-Death Process: A Coherent Model of Fossil Calibration for Divergence Time Estimation
Tracy A. Heath, John P. Huelsenbeck, Tanja Stadler
(Submitted on 10 Oct 2013)

Time-calibrated species phylogenies are critical for addressing a wide range of questions in evolutionary biology, such as those that elucidate historical biogeography or uncover patterns of coevolution and diversification. Because molecular sequence data are not informative on absolute time, external data, most commonly fossil age estimates, are required to calibrate estimates of species divergence dates. For Bayesian divergence-time methods, the common practice for calibration using fossil information involves placing arbitrarily chosen parametric distributions on internal nodes, often disregarding most of the information in the fossil record. We introduce the ‘fossilized birth-death’ (FBD) process, a model for calibrating divergence-time estimates in a Bayesian framework, explicitly acknowledging that extant species and fossils are part of the same macroevolutionary process. Under this model, absolute node age estimates are calibrated by a single diversification model and arbitrary calibration densities are not necessary. Moreover, the FBD model allows for inclusion of all available fossils. We performed analyses of simulated data and show that node-age estimation under the FBD model results in robust and accurate estimates of species divergence times with realistic measures of statistical uncertainty, overcoming major limitations of standard divergence time estimation methods. We then used this model to estimate the speciation times for a dataset composed of all living bears, indicating that the genus Ursus diversified in the late Miocene to mid Pliocene.