Rise and fall of asexual mutators in adapted populations

Rise and fall of asexual mutators in adapted populations
Ananthu James, Kavita Jain
Subjects: Populations and Evolution (q-bio.PE)

In an adapted population in which most mutations are deleterious, the mutation rates are expected to be low. Indeed, in recent experiments on adapted populations of asexual mutators, beneficial mutations that lower the mutation rates have been observed to get fixed. Using a multitype branching process and a deterministic argument, we calculate the time to fix the wildtype mutation rate in an asexual population of mutators, and find it to be a ${\tt U}$-shaped function of the population size. In contrast, the fixation time for mutators is known to increase with population size. On comparing these two time scales, we find that a critical population size exists below which the mutators prevail, while the mutation rate remains low in larger populations. We also discuss how our analytical results compare with the experiments.

Estimating phylogenetic trees from genome-scale data

Estimating phylogenetic trees from genome-scale data
Liang Liu, Zhenxiang Xi, Shaoyuan Wu, Charles Davis, Scott V. Edwards
Comments: 39 pages, 3 figures
Subjects: Populations and Evolution (q-bio.PE)

As researchers collect increasingly large molecular data sets to reconstruct the Tree of Life, the heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. A class of phylogenetic methods known as “species tree methods” have been proposed to directly address one important source of gene tree heterogeneity, namely the incomplete lineage sorting or deep coalescence that occurs when evolving lineages radiate rapidly, resulting in a diversity of gene trees from a single underlying species tree. Although such methods are gaining in popularity, they are being adopted with caution in some quarters, in part because of an increasing number of examples of strong phylogenetic conflict between concatenation or supermatrix methods and species tree methods. Here we review theory and empirical examples that help clarify these conflicts. Thinking of concatenation as a special case of the more general model provided by the multispecies coalescent can help explain a number of differences in the behavior of the two methods on phylogenomic data sets. Recent work suggests that species tree methods are more robust than concatenation approaches to some of the classic challenges of phylogenetic analysis, including rapidly evolving sites in DNA sequences, base compositional heterogeneity and long branch attraction. We show that approaches such as binning, designed to augment the signal in species tree analyses, can distort the distribution of gene trees and are inconsistent. Computationally efficient species tree methods that incorporate biological realism are a key to phylogenetic analysis of whole genome data.

The SMC’ is a highly accurate approximation to the ancestral recombination graph

The SMC’ is a highly accurate approximation to the ancestral recombination graph

Peter R. Wilton, Shai Carmi, Asger Hobolth
(Submitted on 12 Jan 2015)

Two sequentially Markov coalescent models (SMC and SMC’) are available as tractable approximations to the ancestral recombination graph (ARG). We present a model of coalescence at two fixed points along a pair of sequences evolving under the SMC’. Using our model, we derive a number of new quantities related to the pairwise SMC’, thereby analytically quantifying for the first time the similarity between the SMC’ and ARG. We use our model to show that the joint distribution of pairwise coalescence times at recombination sites under the SMC’ is the same as it is marginally under the ARG, demonstrating that the SMC’ is the canonical first-order sequentially Markov approximation to the pairwise ARG. Finally, we use these results to show that population size estimates under the pairwise SMC are asymptotically biased, while under the pairwise SMC’ they are approximately asymptotically unbiased.

The Time-Scale of Recombination Rate Evolution in Great Apes

The Time-Scale of Recombination Rate Evolution in Great Apes

Laurie S Stevison, August E Woerner, Jeffrey M Kidd, Joanna L Kelley, Krishna R Veeramah, Kimberly F McManus, Carlos D Bustamante, Michael F Hammer, Jeffrey D Wall
doi: http://dx.doi.org/10.1101/013755

We present three linkage-disequilibrium (LD)-based recombination maps generated using whole-genome sequencing data of 10 Nigerian chimpanzees, 13 bonobos, and 15 western gorillas, collected as part of the Great Ape Genome Project (Prado-Martinez et al. 2013). Using species-specific PRDM9 sequences to predict potential binding sites, we identified an important role for PRDM9 in predicting recombination rate variation broadly across great apes. Our results are contrary to previous research that PRDM9 is not associated with recombination in western chimpanzees (Auton et al. 2012). Additionally, we show that fewer hotspots are shared among chimpanzee subspecies than within human populations, further narrowing the time-scale of complete hotspot turnover. We quantified the variation in the biased distribution of recombination rates towards recombination hotspots across great apes. We found that correlations between broad-scale recombination rates decline more rapidly than nucleotide divergence between species. We also compared the skew of recombination rates at centromeres and telomeres between species and show a skew from chromosome means extending as far as 10‐15 Mb from chromosome ends. Further, we examined broad-scale recombination rate changes near a translocation in gorillas and found minimal differences as compared to other great ape species perhaps because the coordinates relative to the chromosome ends were unaffected. Finally, based on multiple linear regression analysis, we found that various correlates of recombination rate persist throughout primates including repeats, diversity, divergence and local effective population size (Ne). Our study is the first to analyze within- and between-species genome-wide recombination rate variation in several close relatives.

The P-element strikes again: the recent invasion of natural Drosophila simulans populations

The P-element strikes again: the recent invasion of natural Drosophila simulans populations

Robert Kofler, Tom Hill, Viola Nolte, Andrea Betancourt, Christian Schlötterer
doi: http://dx.doi.org/10.1101/013722

The P-element is one of the best understood eukaryotic transposable elements. It invaded Drosophila melanogaster populations within a few decades, but was thought to be absent from close relatives, including D. simulans. Five decades after the spread in D. melanogaster, we provide evidence that the P-element has also invaded D. simulans. P-elements in D. simulans appear to have been acquired recently from D. melanogaster probably via a single horizontal transfer event. Expression data indicate that the P-element is processed in the germline of D. simulans, and genomic data show an enrichment of P-element insertions in putative origins of replication, similar to that seen in D. melanogaster. This ongoing spread of the P-element in natural populations provides an unique opportunity to understand the dynamics of transposable element spreads and the associated piRNA defense mechanisms.

Distributions of topological tree metrics between a species tree and a gene tree

Distributions of topological tree metrics between a species tree and a gene tree

Jing Xi, Jin Xie, Ruriko Yoshida
(Submitted on 10 Jan 2015)

In order to conduct a statistical analysis on a given set of phylogenetic gene trees, we often use a distance measure between two trees. In a statistical distance-based method to analyze discordance between gene trees, it is a key to decide “biological meaningful” and “statistically well-distributed” distance between trees. Thus, in this paper, we study the distributions of the three tree distance metrics: the edge difference, the path difference, and the precise K interval cospeciation distance, between two trees: first, we focus on distributions of the three tree distances between two random unrooted trees with n leaves (n≥4); and then we focus on the distributions the three tree distances between a fixed rooted species tree with n leaves and a random gene tree with n leaves generated under the coalescent process with given the species tree. We show some theoretical results as well as simulation study on these distributions.

Reprogramming LCLs to iPSCs Results in Recovery of Donor-Specific Gene Expression Signature

Reprogramming LCLs to iPSCs Results in Recovery of Donor-Specific Gene Expression Signature

Samantha M Thomas, Courtney Kagan, Bryan J Pavlovic, Jonathan Burnett, Kristen Patterson, Jonathan K Pritchard, Yoav Gilad
doi: http://dx.doi.org/10.1101/013631

Renewable in vitro cell cultures, such as lymphoblastoid cell lines (LCLs), have facilitated studies that contributed to our understanding of genetic influence on human traits. However, the degree to which cell lines faithfully maintain differences in donor-specific phenotypes is still debated. We have previously reported that standard cell line maintenance practice results in a loss of donor-specific gene expression signatures in LCLs. An alternative to the LCL model is the induced pluripotent stem cell (iPSC) system, which carries the potential to model tissue-specific physiology through the use of differentiation protocols. Still, existing LCL banks represent an important source of starting material for iPSC generation, and it is possible that the disruptions in gene regulation associated with long-term LCL maintenance could persist through the reprogramming process. To address this concern, we studied the effect of reprogramming mature LCLs to iPSCs on the ensuing gene expression patterns within and between six unrelated donor individuals. We show that the reprogramming process results in a recovery of donor-specific gene regulatory signatures. Since environmental contributions are unlikely to be a source of individual variation in our system of highly passaged cultured cell lines, our observations suggest that the effect of genotype on gene regulation is more pronounced in the iPSCs than in the LCL precursors. Our findings indicate that iPSCs can be a powerful model system for studies of phenotypic variation across individuals in general, and the genetic association with variation in gene regulation in particular. We further conclude that LCLs are an appropriate starting material for iPSC generation.

Software for the analysis and visualization of deep mutational scanning data

Software for the analysis and visualization of deep mutational scanning data

Jesse D Bloom
doi: http://dx.doi.org/10.1101/013623

Background Deep mutational scanning is a technique to estimate the impacts of mutations on a gene by using deep sequencing to count mutations in a library of variants before and after imposing a functional selection. The impacts of mutations must be inferred from changes in their counts after selection. Results I describe a software package, dms_tools, to infer the impacts of mutations from deep mutational scanning data using a likelihood-based treatment of the mutation counts. I show that dms_tools yields more accurate inferences on simulated data than the widely used but statistically biased approach of calculating ratios of counts pre- and post-selection. Using dms_tools, one can infer the preference of each site for each amino acid given a single selection pressure, or assess the extent to which these preferences change under different selection pressures. The preferences and their changes can be intuitively visualized with sequence-logo-style plots created using an extension to weblogo. Conclusions dms_tools implements a statistically principled approach for the analysis and subsequent visualization of deep mutational scanning data.

The origin and evolution of maize in the American Southwest

The origin and evolution of maize in the American Southwest

Rute R da Fonseca, Bruce D Smith, Nathan Wales, Enrico Cappellini, Pontus Skoglund, Matteo Fumagalli, José Alfredo Samaniego, Christian Carøe, María C Ávila-Arcos, David E Hufnagel, Thorfinn Sand Korneliussen, Filipe Garrett Vieira, Mattias Jakobsson, Bernardo Arriaza, Eske Willerslev, Rasmus Nielsen, Matthew B Hufford, Anders Albrechtsen, Jeffrey Ross-Ibarra, M Thomas P Gilbert
doi: http://dx.doi.org/10.1101/013540

Maize offers an ideal system through which to demonstrate the potential of ancient population genomic techniques for reconstructing the evolution and spread of domesticates. The diffusion of maize from Mexico into the North American Southwest (SW) remains contentious with the available evidence being restricted to morphological studies of ancient maize plant material. We captured 1 Mb of nuclear DNA from 32 archaeological maize samples spanning 6000 years and compared them with modern landraces including those from the Mexican West coast and highlands. We found that the initial diffusion of domesticated maize into the SW is likely to have occurred through a highland route. However, by 2000 years ago a Pacific coastal corridor was also being used. Furthermore, we could distinguish between genes that were selected for early during domestication (such as zagl1 involved in shattering) from genes that changed in the SW context (e.g. related to sugar content and adaptation to drought) likely as a response to the local arid environment and new cultural uses of maize.

Response of polygenic traits under stabilising selection and mutation when loci have unequal effects

Response of polygenic traits under stabilising selection and mutation when loci have unequal effects

Kavita Jain, Wolfgang Stephan
(Submitted on 9 Jan 2015)

We consider an infinitely large population under stabilising selection and mutation in which the allelic effects determining a polygenic trait vary between loci. We obtain analytical expressions for the stationary genetic variance as a function of the distribution of effects, mutation rate and selection coefficient. We also study the dynamics of the allele frequencies, focussing on short-term evolution of the phenotypic mean as it approaches the optimum after an environmental change. We find that when most effects are small, the genetic variance does not change appreciably during adaptation, and the time until the phenotypic mean reaches the optimum is short if the number of loci is large. However, when most effects are large, the change of the variance during the adaptive process cannot be neglected. In this case, the short-term dynamics may be described by that of a single locus of large effect. Our results may be used to understand polygenic selection driving rapid adaptation.