The role of standing variation in geographic convergent adaptation
Peter L. Ralph, Graham Coop
The extent to which populations experiencing shared selective pressures adapt through a shared genetic response is relevant to many questions in evolutionary biology. In a number of well studied traits and species, it appears that convergent evolution within species is common. In this paper, we explore how standing, deleterious genetic variation contributes to convergent genetic responses in a geographically spread population, extending our previous work on the topic. Geographically limited dispersal slows the spread of each selected allele, hence allowing other alleles — newly arisen mutants or present as standing variation — to spread before any one comes to dominate the population. When such alleles meet, their progress is substantially slowed — if the alleles are selectively equivalent, they mix slowly, dividing the species range into a random tessellation, which can be well understood by analogy to a Poisson process model of crystallization. In this framework, we derive the geographic scale over which a typical allele is expected to dominate, the time it takes the species to adapt as a whole, and the proportion of adaptive alleles that arise from standing variation. Finally, we explore how negative pleiotropic effects of alleles before an environment change can bias the subset of alleles that get to contribute to a species adaptive response. We apply the results to the many geographically localized G6PD deficiency alleles thought to confer resistance to malaria, whose large mutational target size and deleterious effects make them likely candidates to have been present as deleterious standing variation. We find the numbers and geographic spread of these alleles matches our predictions reasonably well, which suggest that these arose both from standing variation and new mutations since the advent of malaria. Our results suggest that much of adaptation may be geographically local even when selection pressures are wide-spread. We close by discussing the implications of these results for arguments of species coherence and the nature of divergence between species.
This month set a new record for traffic to Haldane’s Sieve; thanks to everyone for the support. The most viewed posts were:
XWAS: a toolset for genetic data analysis and association studies of the X chromosome
Diana Chang, Feng Gao, Alon Keinan
Summary: We present XWAS (chromosome X-Wide Analysis tool-Set)–a toolset specially designed for analysis of the X chromosome in association studies, both on the level of single markers and the level of entire genes. It further offers other X-specific analysis tools, including quality control (QC) procedures for X-linked data. We have applied and tested this software by carrying out several X-wide association studies of autoimmune diseases. Availability and Implementation: The XWAS software package, which includes scripts, the binary executable PLINK/XWAS and all source code is freely available for download from http://keinanlab.cb.bscb.cornell.edu/content/tools-data. PLINK/XWAS is implemented in C++ and other features in shell scripts and Perl. This software package is designed for Linux systems.
Estimating gene expression and codon specific translational efficiencies, mutation biases, and selection coefficients from genomic data
Michael Gilchrist, Wei-Chen Chen, Premal Shah, Russell Zaretzki
The time and cost of generating a genomic dataset is expected to continue to decline dramatically in the upcoming years. As a result, extracting biologically meaningful information from this continuing flood of data is a major challenge in biology. In response, we present a powerful Bayesian MCMC method based on a nested model of protein synthesis and population genetics. Analyzing the patterns of codon usage observed within a genome, our algorithm extracts and decouples information on codon specific translational efficiencies and mutation biases as well as gene specific expression levels for all coding sequences. This information can be combined to generate gene and codon specific estimates of selection on synonymous substitutions. One major advance over previous work is that our method can be used without independent measurements of gene expression. Using the Saccharomyces cerevisiae S288c genome, we compare our model fits with and without independent gene expression measurements and observe an exceptionally high correlation between our codon specific parameters and gene specific expression levels (ρ > 0.99 in all cases). We also observe robust correlations between our predictions generated without independent expression measurements and previously published estimates of mutation bias, ribosome pausing time, and empirical estimates of mRNA abundance (ρ=0.53-0.72). Our results indicate that failing to take mutation bias into account can lead to the misidentification of an amino acid’s `optimal’ codon. In conclusion, our method demonstrates that an enormous amount of biologically important information is encoded within genome scale patterns of codon usage and this information can be accessed through carefully formulated, biologically based models.
WFABC: a Wright-Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data
Matthieu Foll, Hyunjin Shim, Jeffrey D. Jensen
With novel developments in sequencing technologies, time-sampled data are becoming more available and accessible. Naturally, there have been efforts in parallel to infer population genetic parameters from these datasets. Here, we compare and analyze four recent approaches based on the Wright-Fisher model for inferring selection coefficients (s) given effective population size (Ne), with simulated temporal datasets. Furthermore, we demonstrate the advantage of a recently proposed ABC-based method that is able to correctly infer genome-wide average Ne from time-serial data, which is then set as a prior for inferring per-site selection coefficients accurately and precisely. We implement this ABC method in a new software and apply it to a classical time-serial dataset of the medionigra genotype in the moth Panaxia dominula. We show that a recessive lethal model is the best explanation for the observed variation in allele frequency by implementing an estimator of the dominance ratio (h).
Thinking too positive? Revisiting current methods of population-genetic selection inference
Claudia Bank, Gregory B Ewing, Anna Ferrer-Admettla, Matthieu Foll, Jeffrey D Jensen
In the age of next-generation sequencing, the availability of increasing amounts and quality of data at decreasing cost ought to allow for a better understanding of how natural selection is shaping the genome than ever before. Yet, alternative forces such as demography and background selection obscure the footprints of positive selection that we would like to identify. Here, we illustrate recent developments in this area, and outline a roadmap for improved selection inference. We argue (1) that the development and obligatory use of advanced simulation tools is necessary for improved identification of selected loci, (2) that genomic information from multiple- time points will enhance the power of inference, and (3) that results from experimental evolution should be utilized to better inform population-genomic studies.
On the prospect of identifying adaptive loci in recently bottlenecked populations
Yu-Ping Poh, Vera S Domingues, Hopi Hoekstra, Jeffrey Jensen
Identifying adaptively important loci in recently bottlenecked populations—be it natural selection acting on a population following the colonization of novel habitats in the wild, or artificial selection during the domestication of a breed—remains a major challenge. Here we report the results of a simulation study examining the performance of available population-genetic tools for identifying genomic regions under selection. To illustrate our findings, we examined the interplay between selection and demography in two species of Peromyscus mice, for which we have independent evidence of selection acting on phenotype as well as functional evidence identifying the underlying genotype. With this unusual information, we tested whether population-genetic-based approaches could have been utilized to identify the adaptive locus. Contrary to published claims, we conclude that the use of the background site frequency spectrum as a null model is largely ineffective in bottlenecked populations. Results are quantified both for site frequency spectrum and linkage disequilibrium-based predictions, and are found to hold true across a large parameter space that encompasses many species and populations currently under study. These results suggest that the genomic footprint left by selection on both new and standing variation in strongly bottlenecked populations will be difficult, if not impossible, to find using current approaches.