Inference of evolutionary forces acting on human biological pathways

Inference of evolutionary forces acting on human biological pathways

Josephine T Daub, Isabelle Dupanloup, Marc Robinson-Rechavi, Laurent Excoffier
doi: http://dx.doi.org/10.1101/009928

Because natural selection is likely to act on multiple genes underlying a given phenotypic trait, we study here the potential effect of ongoing and past selection on the genetic diversity of human biological pathways. We first show that genes included in gene sets are generally under stronger selective constraints than other genes and that their evolutionary response is correlated. We then introduce a new procedure to detect selection at the pathway level based on a decomposition of the classical McDonald-Kreitman test extended to multiple genes. This new test, called 2DNS, detects outlier gene sets and takes into account past demographic effects as well as evolutionary constraints specific to gene sets. Selective forces acting on gene sets can be easily identified by a mere visual inspection of the position of the gene sets relative to their 2D null distribution. We thus find several outlier gene sets that show signals of positive, balancing, or purifying selection, but also others showing an ancient relaxation of selective constraints. The principle of the 2DNS test can also be applied to other genomic contrasts. For instance, the comparison of patterns of polymorphisms private to African and non-African populations reveals that most pathways show a higher proportion of non-synonymous mutations in non-Africans than in Africans, potentially due to different demographic histories and selective pressures.

The Drosophila Genome Nexus: a population genomic resource of 605 Drosophila melanogaster genomes, including 197 genomes from a single ancestral range population

The Drosophila Genome Nexus: a population genomic resource of 605 Drosophila melanogaster genomes, including 197 genomes from a single ancestral range population

Justin Lack, Charis Cardeno, Marc Crepeau, William Taylor, Russ Corbett-Detig, Kristian Stevens, Charles H. Langley, John Pool
doi: http://dx.doi.org/10.1101/009886

Hundreds of wild-derived D. melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach, and settled on an assembly strategy that utilizes two alignment programs and incorporates both SNPs and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets (previous DPGP releases and the DGRP freeze 2.0), and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 605 consistently aligned genomes, and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets.

Reticulate speciation and adaptive introgression in the Anopheles gambiae species complex

Reticulate speciation and adaptive introgression in the Anopheles gambiae species complex

Jacob Crawford, Michelle M. Riehle, Wamdaogo M. Guelbeogo, Awa Gneme, N’fale Sagnon, Kenneth D. Vernick, Rasmus Nielsen, Brian P. Lazzaro
doi: http://dx.doi.org/10.1101/009837

Species complexes are common, especially among insect disease vectors, and understanding how barriers to gene flow among these populations become established or violated is critical for implementation of vector-targeting disease control. Anopheles gambiae, the primary vector of human malaria in sub-Saharan Africa, exists as a series of ecologically specialized populations that are phylogenetically nested within a species complex. These populations exhibit varying degrees of reproductive isolation, sometimes recognized as distinct subspecies. We have sequenced 32 complete genomes from field-captured individuals of Anopheles gambiae, Anopheles gambiae M form (recently named A. coluzzii), sister species A. arabiensis, and the recently discovered “GOUNDRY” subgroup of A. gambiae that is highly susceptible to Plasmodium. Amidst a backdrop of strong reproductive isolation and adaptive differentiation, we find evidence for adaptive introgression of autosomal chromosomal regions among species and populations. The X chromosome, however, remains strongly differentiated among all of the subpopulations, pointing to a disproportionately large effect of X chromosome genes in driving speciation among anophelines. Strikingly, we find that autosomal introgression has occurred from contemporary hybridization among A. gambiae and A. arabiensis despite strong divergence (~5× higher than autosomal divergence) and isolation on the X chromosome. We find a large region of the X chromosome that has recently swept to fixation in the GOUNDRY subpopulation, which may be an inversion that serves as a partial barrier to gene flow. We also find that the GOUNDRY population is highly inbred, implying increased philopatry in this population. Our results show that ecological speciation in this species complex results in genomic mosaicism of divergence and adaptive introgression that creates a reticulate gene pool connecting vector populations across the speciation continuum with important implications for malaria control efforts.

The role of standing variation in geographic convergent adaptation

The role of standing variation in geographic convergent adaptation
Peter L. Ralph, Graham Coop
doi: http://dx.doi.org/10.1101/009803

The extent to which populations experiencing shared selective pressures adapt through a shared genetic response is relevant to many questions in evolutionary biology. In a number of well studied traits and species, it appears that convergent evolution within species is common. In this paper, we explore how standing, deleterious genetic variation contributes to convergent genetic responses in a geographically spread population, extending our previous work on the topic. Geographically limited dispersal slows the spread of each selected allele, hence allowing other alleles — newly arisen mutants or present as standing variation — to spread before any one comes to dominate the population. When such alleles meet, their progress is substantially slowed — if the alleles are selectively equivalent, they mix slowly, dividing the species range into a random tessellation, which can be well understood by analogy to a Poisson process model of crystallization. In this framework, we derive the geographic scale over which a typical allele is expected to dominate, the time it takes the species to adapt as a whole, and the proportion of adaptive alleles that arise from standing variation. Finally, we explore how negative pleiotropic effects of alleles before an environment change can bias the subset of alleles that get to contribute to a species adaptive response. We apply the results to the many geographically localized G6PD deficiency alleles thought to confer resistance to malaria, whose large mutational target size and deleterious effects make them likely candidates to have been present as deleterious standing variation. We find the numbers and geographic spread of these alleles matches our predictions reasonably well, which suggest that these arose both from standing variation and new mutations since the advent of malaria. Our results suggest that much of adaptation may be geographically local even when selection pressures are wide-spread. We close by discussing the implications of these results for arguments of species coherence and the nature of divergence between species.

Most viewed on Haldane’s Sieve: September 2014

This month set a new record for traffic to Haldane’s Sieve; thanks to everyone for the support. The most viewed posts were:

XWAS: a toolset for genetic data analysis and association studies of the X chromosome

XWAS: a toolset for genetic data analysis and association studies of the X chromosome

Diana Chang, Feng Gao, Alon Keinan
doi: http://dx.doi.org/10.1101/009795

Summary: We present XWAS (chromosome X-Wide Analysis tool-Set)–a toolset specially designed for analysis of the X chromosome in association studies, both on the level of single markers and the level of entire genes. It further offers other X-specific analysis tools, including quality control (QC) procedures for X-linked data. We have applied and tested this software by carrying out several X-wide association studies of autoimmune diseases. Availability and Implementation: The XWAS software package, which includes scripts, the binary executable PLINK/XWAS and all source code is freely available for download from http://keinanlab.cb.bscb.cornell.edu/content/tools-data. PLINK/XWAS is implemented in C++ and other features in shell scripts and Perl. This software package is designed for Linux systems.

Estimating gene expression and codon specific translational efficiencies, mutation biases, and selection coefficients from genomic data

Estimating gene expression and codon specific translational efficiencies, mutation biases, and selection coefficients from genomic data

Michael Gilchrist, Wei-Chen Chen, Premal Shah, Russell Zaretzki
doi: http://dx.doi.org/10.1101/009670

The time and cost of generating a genomic dataset is expected to continue to decline dramatically in the upcoming years. As a result, extracting biologically meaningful information from this continuing flood of data is a major challenge in biology. In response, we present a powerful Bayesian MCMC method based on a nested model of protein synthesis and population genetics. Analyzing the patterns of codon usage observed within a genome, our algorithm extracts and decouples information on codon specific translational efficiencies and mutation biases as well as gene specific expression levels for all coding sequences. This information can be combined to generate gene and codon specific estimates of selection on synonymous substitutions. One major advance over previous work is that our method can be used without independent measurements of gene expression. Using the Saccharomyces cerevisiae S288c genome, we compare our model fits with and without independent gene expression measurements and observe an exceptionally high correlation between our codon specific parameters and gene specific expression levels (ρ > 0.99 in all cases). We also observe robust correlations between our predictions generated without independent expression measurements and previously published estimates of mutation bias, ribosome pausing time, and empirical estimates of mRNA abundance (ρ=0.53-0.72). Our results indicate that failing to take mutation bias into account can lead to the misidentification of an amino acid’s `optimal’ codon. In conclusion, our method demonstrates that an enormous amount of biologically important information is encoded within genome scale patterns of codon usage and this information can be accessed through carefully formulated, biologically based models.

WFABC: a Wright-Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data

WFABC: a Wright-Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data

Matthieu Foll, Hyunjin Shim, Jeffrey D. Jensen
doi: http://dx.doi.org/10.1101/009696

With novel developments in sequencing technologies, time-sampled data are becoming more available and accessible. Naturally, there have been efforts in parallel to infer population genetic parameters from these datasets. Here, we compare and analyze four recent approaches based on the Wright-Fisher model for inferring selection coefficients (s) given effective population size (Ne), with simulated temporal datasets. Furthermore, we demonstrate the advantage of a recently proposed ABC-based method that is able to correctly infer genome-wide average Ne from time-serial data, which is then set as a prior for inferring per-site selection coefficients accurately and precisely. We implement this ABC method in a new software and apply it to a classical time-serial dataset of the medionigra genotype in the moth Panaxia dominula. We show that a recessive lethal model is the best explanation for the observed variation in allele frequency by implementing an estimator of the dominance ratio (h).

Thinking too positive? Revisiting current methods of population-genetic selection inference

Thinking too positive? Revisiting current methods of population-genetic selection inference
Claudia Bank, Gregory B Ewing, Anna Ferrer-Admettla, Matthieu Foll, Jeffrey D Jensen
doi: http://dx.doi.org/10.1101/009654

In the age of next-generation sequencing, the availability of increasing amounts and quality of data at decreasing cost ought to allow for a better understanding of how natural selection is shaping the genome than ever before. Yet, alternative forces such as demography and background selection obscure the footprints of positive selection that we would like to identify. Here, we illustrate recent developments in this area, and outline a roadmap for improved selection inference. We argue (1) that the development and obligatory use of advanced simulation tools is necessary for improved identification of selected loci, (2) that genomic information from multiple- time points will enhance the power of inference, and (3) that results from experimental evolution should be utilized to better inform population-genomic studies.

On the prospect of identifying adaptive loci in recently bottlenecked populations

On the prospect of identifying adaptive loci in recently bottlenecked populations
Yu-Ping Poh, Vera S Domingues, Hopi Hoekstra, Jeffrey Jensen
doi: http://dx.doi.org/10.1101/009456

Identifying adaptively important loci in recently bottlenecked populations—be it natural selection acting on a population following the colonization of novel habitats in the wild, or artificial selection during the domestication of a breed—remains a major challenge. Here we report the results of a simulation study examining the performance of available population-genetic tools for identifying genomic regions under selection. To illustrate our findings, we examined the interplay between selection and demography in two species of Peromyscus mice, for which we have independent evidence of selection acting on phenotype as well as functional evidence identifying the underlying genotype. With this unusual information, we tested whether population-genetic-based approaches could have been utilized to identify the adaptive locus. Contrary to published claims, we conclude that the use of the background site frequency spectrum as a null model is largely ineffective in bottlenecked populations. Results are quantified both for site frequency spectrum and linkage disequilibrium-based predictions, and are found to hold true across a large parameter space that encompasses many species and populations currently under study. These results suggest that the genomic footprint left by selection on both new and standing variation in strongly bottlenecked populations will be difficult, if not impossible, to find using current approaches.