Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila

Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila
Alan O. Bergland, Emily L. Behrman, Katherine R. O’Brien, Paul S. Schmidt, Dmitri A. Petrov
(Submitted on 20 Mar 2013)

In many species, genomic data have revealed pervasive adaptive evolution indicated by the near fixation of beneficial alleles. However, when selection pressures are highly variable along a species range or through time adaptive alleles may persist at intermediate frequencies for long periods. So called balanced polymorphisms have long been understood to be an important component of standing genetic variation yet direct evidence of the ubiquity of balancing selection has remained elusive. We hypothesized that environmental fluctuations between seasons in a North American orchard would impose temporally variable selection on Drosophila melanogaster and consequently maintain allelic variation at polymorphisms adaptively evolving in response climatic variation. We identified hundreds of polymorphisms whose frequency oscillates among seasons and argue that these loci are subject to strong, temporally variable selection. We show that adaptively oscillating polymorphisms are often millions of years old, predating the divergence between D. melanogaster and D. simulans and that a subset of these polymorphisms respond predictably to an acute frost event. Taken together, our results demonstrate that rapid temporal fluctuations in climate over generational scales is a predominant force that maintains adaptive alleles and promotes genetic diversity.

Genomic Sequence Diversity and Population Structure of Saccharomyces cerevisiae Assessed by RAD-seq

Genomic Sequence Diversity and Population Structure of Saccharomyces cerevisiae Assessed by RAD-seq
Gareth A. Cromie, Katie E. Hyma, Catherine L. Ludlow, Cecilia Garmendia-Torres, Teresa L. Gilbert, Patrick May, Angela A. Huang, Aimée M. Dudley, Justin C. Fay
(Submitted on 20 Mar 2013)

The budding yeast Saccharomyces cerevisiae is important for human food production and as a model organism for biological research. The genetic diversity contained in the global population of yeast strains represents a valuable resource for a number of fields, including genetics, bioengineering, and studies of evolution and population structure. Here, we apply a multiplexed, reduced genome sequencing strategy (known as RAD-seq) to genotype a large collection of S. cerevisiae strains, isolated from a wide range of geographical locations and environmental niches. The method permits the sequencing of the same 1% of all genomes, producing a multiple sequence alignment of 116,880 bases across 262 strains. We find diversity among these strains is principally organized by geography, with European, North American, Asian and African/S. E. Asian populations defining the major axes of genetic variation. At a finer scale, small groups of strains from cacao, olives and sake are defined by unique variants not present in other strains. One population, containing strains from a variety of fermentations, exhibits high levels of heterozygosity and mixtures of alleles from European and Asian populations, indicating an admixed origin for this group. In the context of this global diversity, we demonstrate that a collection of seven strains commonly used in the laboratory encompasses only one quarter of the genetic diversity present in the full collection of strains, underscoring the relatively limited genetic diversity captured by the current set of lab strains. We propose a model of geographic differentiation followed by human-associated admixture, primarily between European and Asian populations and more recently between European and North American populations. The large collection of genotyped yeast strains characterized here will provide a useful resource for the broad community of yeast researchers.

Loss and Recovery of Genetic Diversity in Adapting Populations of HIV

Loss and Recovery of Genetic Diversity in Adapting Populations of HIV
Pleuni Pennings, Sergey Kryazhimskiy, John Wakeley
(Submitted on 15 Mar 2013)

A population’s adaptive potential is the likelihood that it will adapt in response to an environmental challenge, e.g., develop resistance in response to drug treatment. The effective population size inferred from genetic diversity at neutral sites has been traditionally taken as a major predictor of adaptive potential. However recent studies demonstrate that such effective population size vastly underestimates the population’s adaptive potential (Karasov 2010).
Here we use data from treated HIV-infected patients (Bacheler2000) to estimate the effective size of HIV populations relevant for adaptation. Our estimate is based on the frequencies of soft and hard selective sweeps of a known resistance mutation K103N. We observe that 41% of HIV populations in this study acquire resistance via at least two functionally equivalent but distinct mutations which sweep to fixation without significantly reducing genetic diversity at neighboring sites (soft selective sweeps). We further estimate that 20% of populations acquire a resistant allele via a single mutation that sweeps to fixation and drastically reduces genetic diversity (hard selective sweeps). We infer that the effective population size that determines the adaptive potential of within-patient HIV populations is approximately 150,000. Our estimate is two orders of magniture higher than a classical estimate based on diversity at synonymous sites.
Three not mutually exclusive reasons can explain this discrepancy:
(1) some synonymous mutations may be under selection;
(2) highly beneficial mutations may be less affected by ongoing linked selection than synonymous mutations; and
(3) synonymous diversity may not be at its expected equilibrium because it recovers slowly from sweeps and bottlenecks.

A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes

A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes
John A. Capra, Melissa J. Hubisz, Dennis Kostka, Katherine S. Pollard, Adam Siepel
(Submitted on 9 Mar 2013)

GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts ~1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. We also find some evidence that they contribute to the fixation of deleterious alleles, including an enrichment for disease-associated polymorphisms. These tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages; they supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available.

Our paper: Soft selective sweeps are the primary mode of recent adaptation in Drosophila melanogaster

This guest post is by Nandita R. Garud, Philipp W. Messer, Erkan O. Buzbas, and Dmitri A. Petrov, on their paper
 Soft selective sweeps are the primary mode of recent adaptation in Drosophila melanogaster, arXived here

We typically think of adaptive events as arising from single de novo mutations that sweep through the population one at a time. In this scenario, one expects to observe the signatures of hard selective sweeps, where a single haplotype rises to very high frequencies, removing variation in linked genomic regions. It is also possible, however, that adaptation could lead to signatures of soft sweeps. Soft sweeps are generated by multiple adaptive haplotypes rising in frequency at the same time, either because (i) the adaptive mutation comes from standing variation and thus had time to recombine onto multiple haplotypes, or (ii) because multiple de novo mutations arise virtually simultaneously. The second mode is likely in large populations or when the adaptive mutation rate per locus is high.

Soft sweeps have generally been considered a mere curiosity and most scans for adaptation focus on the hard sweep scenario. Despite this prevailing view, the three best-studied cases of adaptation in Drosophila at the loci Ace, CHKov1, and Cyp6g1 all show signatures of soft sweeps. In two cases (Ace and Cyp6g1), soft sweeps were generated by de novo mutations indicating that the population size in D. melanogaster relevant to adaptation is on the order of billions or larger. In one case (CHKov1), soft sweeps arose from standing variation. Surprisingly, we do not have very convincing cases of recent adaptation in Drosophila that generated hard sweeps.

Nevertheless, it remained an open question of whether these three cases were the exception or the norm. They are all related to pesticide or viral resistance and it is entirely possible that much adaptation unrelated to human disturbance or immunity proceeds differently and might generate hard sweeps.

In this paper, we developed two haplotype statistics that allowed us to systematically identify hard and soft sweeps with similar power and then to differentiate them from each other. We applied these statistics to the Drosophila polymorphism data of ~150 fully sequenced, inbred strains available through the Drosophila Genetic Reference Panel (DGRP).

We found abundant signatures of recent and strong sweeps in the Drosophila genome with haplotype structure often extending over tens or even hundreds of kb. However, to our surprise, when we looked at the top 50 peaks, all of them showed signatures of soft sweeps, while we could not convincingly demonstrate the existence of any hard sweeps.

Our results suggest that hard sweeps might be exceedingly rare in Drosophila. Instead, it appears that adaptation in Drosophila primarily proceeds via soft sweeps and thus often involves standing genetic variation or recurrent de novo mutations. There are two caveats, however: One is that we were only able to study strong and recent adaptation. Such strong adaptation should “feel” recent population sizes that are close to the census size, whereas it should be insensitive to bottlenecks that have occurred in the distant past. Weaker adaptation, on the other hand, might take longer and thus would be sensitive to ancient bottlenecks or interference from other sweeps. Whether weak adaptation thus proceeds via hard sweeps remains to be seen. The second caveat is that much of adaptation might involve sweeps that are so soft and move so many haplotypes up in frequency that we cannot detect them. Similarly, adaptation could often be polygenic involving very subtle shifts in allele frequency at many loci. These modes would hardly leave any signatures of sweeps at all. Whichever way it is, it is becoming increasingly clear that adaptation in Drosophila and many other organisms is likely to be much more complex, much more common, and in many ways a much more turbulent process than we usually tend to think.

From Many, One: Genetic Control of Prolificacy during Maize Domestication

From Many, One: Genetic Control of Prolificacy during Maize Domestication
David M. Wills, Clinton Whipple, Shohei Takuno, Lisa E. Kursel, Laura M. Shannon, Jeffrey Ross-Ibarra, John F. Doebley
(Submitted on 4 Mar 2013)

A reduction in number and an increase in size of inflorescences is a common aspect of plant domestication. When maize was domesticated from teosinte, the number and arrangement of ears changed dramatically. Teosinte has long lateral branches that bear multiple small ears at their nodes and tassels at their tips. Maize has much shorter lateral branches that are tipped by a single large ear with no additional ears at the branch nodes. To investigate the genetic basis of this difference in prolificacy (the number of ears on a plant), we performed a genome-wide QTL scan. A large effect QTL for prolificacy (prol1.1) was detected on the short arm of chromosome one in a location that has previously been shown to influence multiple domestication traits. We fine-mapped prol1.1 to a 2.7 kb interval or causative region upstream of the grassy tillers1 gene, which encodes a homeodomain leucine zipper transcription factor. Tissue in situ hybridizations reveal that the maize allele of prol1.1 is associated with up-regulation of gt1 expression in the nodal plexus. Given that maize does not initiate secondary ear buds, the expression of gt1 in the nodal plexus in maize may suppress their initiation. Population genetic analyses indicate positive selection on the maize allele of prol1.1, causing a partial sweep that fixed the maize allele throughout most of domesticated maize. This work shows how a subtle cis-regulatory change in tissue specific gene expression altered plant architecture in a way that improved the harvestability of maize.

Soft selective sweeps are the primary mode of recent adaptation in Drosophila melanogaster

Soft selective sweeps are the primary mode of recent adaptation in Drosophila melanogaster
Nandita R. Garud, Philipp W. Messer, Erkan O. Buzbas, Dmitri A. Petrov
(Submitted on 5 Mar 2013)

Adaptation is often thought to leave the signature of a hard selective sweep, in which a single haplotype bearing the beneficial allele reaches high population frequency. However, an alternative and often-overlooked scenario is that of a soft selective sweep, in which multiple adaptive haplotypes sweep through the population simultaneously. Soft selective sweeps are likely either when adaptation proceeds from standing genetic variation or in large populations where adaptation is not mutation-limited. Current statistical methods are not well designed to test for soft sweeps, and thus are likely to miss these possibly numerous adaptive events because they look for characteristic reductions in heterozygosity. Here, we developed a statistical test based on a haplotype statistic, H12, capable of detecting both hard and soft sweeps with similar power. We used H12 to identify multiple genomic regions that have undergone recent and strong adaptation using a population sample of fully sequenced Drosophila melanogaster strains (DGRP). We then developed a second statistical test based on a statistic H2/H1 | H12, to test whether a given selective sweep detected by H12 is hard or soft. Surprisingly, when applying the test based on H2/H1 | H12 to the top 50 most extreme H12 candidates in the DGRP data, we reject the hard sweep hypothesis in every case. In contrast, all 50 cases show strong support (Bayes Factor >10) for a soft sweep model. Our results suggest that recent adaptation in North American populations of D. melanogaster has led primarily to soft sweeps either because it utilized standing genetic variation or because the short-term effective population size in D. melanogaster is on the order of billions or larger.

Deleterious synonymous mutations hitchhike to high frequency in HIV-1 env evolution

Deleterious synonymous mutations hitchhike to high frequency in HIV-1 env evolution
Fabio Zanini, Richard A. Neher
(Submitted on 4 Mar 2013)

Intrapatient HIV-1 evolution is dominated by selection on the protein level in the arms race with the adaptive immune system. When cytotoxic CD8+ T-cells or neutralizing antibodies target a new epitope, the virus often escapes via nonsynonymous mutations that impair recognition. Synonymous mutations do not affect this interplay and are often assumed to be neutral. We analyze longitudinal intrapatient data from the C2-V5 part of the envelope gene (env) and observe that synonymous derived alleles rarely fix even though they often reach high frequencies in the viral population. We find that synonymous mutations that disrupt base pairs in RNA stems flanking the variable loops of gp120 are more likely to be lost than other synonymous changes, hinting at a direct fitness effect of these stem-loop structures in the HIV-1 RNA. Computational modeling indicates that these synonymous mutations have a (Malthusian) selection coefficient of the order of -0.002 and that they are brought up to high frequency by hitchhiking on neighboring beneficial nonsynonymous alleles. The patterns of fixation of nonsynonymous mutations estimated from the longitudinal data and comparisons with computer models suggest that escape mutations in C2-V5 are only transiently beneficial, either because the immune system is catching up or because of competition between equivalent escapes.

Population Genetics of Rare Variants and Complex Diseases

Population Genetics of Rare Variants and Complex Diseases
M. Cyrus Maher, Lawrence H. Uricchio, Dara G. Torgerson, Ryan D. Hernandez
(Submitted on 12 Feb 2013)

Identifying drivers of complex traits from the noisy signals of genetic variation obtained from high throughput genome sequencing technologies is a central challenge faced by human geneticists today. We hypothesize that the variants involved in complex diseases are likely to exhibit non-neutral evolutionary signatures. Uncovering the evolutionary history of all variants is therefore of intrinsic interest for complex disease research. However, doing so necessitates the simultaneous elucidation of the targets of natural selection and population-specific demographic history. Here we characterize the action of natural selection operating across complex disease categories, and use population genetic simulations to evaluate the expected patterns of genetic variation in large samples. We focus on populations that have experienced historical bottlenecks followed by explosive growth (consistent with most human populations), and describe the differences between evolutionarily deleterious mutations and those that are neutral. Genes associated with several complex disease categories exhibit stronger signatures of purifying selection than non-disease genes. In addition, loci identified through genome-wide association studies of complex traits also exhibit signatures consistent with being in regions recurrently targeted by purifying selection. Through simulations, we show that population bottlenecks and rapid growth enables deleterious rare variants to persist at low frequencies just as long as neutral variants, but low frequency and common variants tend to be much younger than neutral variants. This has resulted in a large proportion of modern-day rare alleles that have a deleterious effect on function, and that potentially contribute to disease susceptibility.

Reproductive isolation between phylogeographic lineages scales with divergence

Reproductive isolation between phylogeographic lineages scales with divergence
Sonal Singhal, Craig Moritz
(Submitted on 17 Jan 2013)

Phylogeographic studies frequently reveal multiple morphologically-cryptic lineages within species. What is yet unclear is whether such lineages represent nascent species or evolutionary ephemera. To address this question, we compare five contact zones, each of which occurs between eco-morphologically cryptic lineages of rainforest skinks from the rainforests of the Australian Wet Tropics. Although the contacts likely formed concurrently in response to Holocene expansion from glacial refugia, we estimate that the divergence times (t) of the lineage-pairs range from 3.1 to 11.5 Myr. Multilocus analyses of the contact zones yielded estimates of reproductive isolation that are tightly correlated with divergence time and, for longer-diverged lineages (t > 5 Myr), substantial. These results show that phylogeographic splits of increasing depth can represent stages along the speciation continuum, even in the absence of overt change in ecologically relevant morphology.