Author post: Adaptive evolution is substantially impeded by Hill-Robertson interference in Drosophila

This guest post is by David Castellano and Adam Eyre-Walker on their preprint (with co-authors) Adaptive evolution is substantially impeded by Hill-Robertson interference in Drosophila.

Our paper “Adaptive evolution is substantially impeded by Hill-Robertson interference in Drosophila”, in which we investigate the role of both the rate of recombination and the mutation rate on the rate of adaptive amino acid substitutions, has been available at biorxiv (http://dx.doi.org/10.1101/021600) since 27 June.

Population genetics theory predicts that the rate of adaptive evolution should depend upon the rate of recombination; genes with low rates of recombination will suffer from Hill-Robertson interference (HRi) in which selected mutations interfere with each other (see the figure below): a newly arising advantageous mutation may find itself in competition for fixation with another advantageous mutation at a linked locus on another chromosome in the population, or in linkage disequilibrium with deleterious mutations, which will reduce its probability of fixation if it can not recombine away from them.

A schematic HRi example among adaptive alleles (left) and among adaptive and deleterious alleles (right).

A schematic HRi example among adaptive alleles (left) and among adaptive and deleterious alleles (right).

Likewise, it is expected that genes with higher mutation rates will undergo more adaptive evolution than genes with low mutation rates. More interestingly an interaction between the rate of recombination and the rate of mutation is also expected; HRi should be more prevalent in genes with high mutation rates and low rates of recombination. No attempt has been done so far to quantify the overall impact of HRi on the rate of adaptive evolution for any given genome. In our paper we propose a way to quantify the number of adaptive substitutions lost due to HRi – approximately 27% of all adaptive mutations, which would go to fixation since the split of D. melanogaster – D. yakuba if there was free recombination, are lost due to HRi. Moreover, we are able to estimate how the fraction of lost adaptive amino acid substitutions to HRi depends on gene’s mutation rate. In agreement with our expectations, genes with high mutation rates lose a significantly higher proportion of adaptive substitutions than genes with low mutation rates (43% vs 11%, respectively).

An open question is to what extent HRi affects rates of adaptive evolution in other species. Moreover, the loss of adaptive substitutions to HRi can potentially tell us something important about the strength of selection acting on some advantageous mutations, since weakly selected mutations are those that are most likely to be affected by HRi. This will require further analysis and population genetic modeling, but in combination with other sources of information, for example, the dip in diversity around non-synonymous substitutions, the site frequency spectrum the high frequency variants that are left by selective sweeps it may be possible to infer much more about the DFE of advantageous mutations than previously thought.

It will be of great interest to do similar analyses to those performed here in other species.

Comments very welcome!
David and Adam

A tree metric using structure and length to capture distinct phylogenetic signals

A tree metric using structure and length to capture distinct phylogenetic signalsMichelle Kendall, Caroline Colijn
Subjects: Populations and Evolution (q-bio.PE)

Phylogenetic trees are a central tool in understanding evolution. They are typically inferred from sequence data, and capture evolutionary relationships through time. It is essential to be able to compare trees from different data sources (e.g. several genes from the same organisms) and different inference methods. We propose a new metric for robust, quantitative comparison of rooted, labeled trees. It enables clear visualizations of tree space, gives meaningful comparisons between trees, and can detect distinct islands of tree topologies in posterior distributions of trees. This makes it possible to select well-supported summary trees. We demonstrate our approach on Dengue fever phylogenies.

MuCor: Mutation Aggregation and Correlation

MuCor: Mutation Aggregation and Correlation

Karl W Kroll, Ann-Katherin Eisfeld, Gerard Lozanski, Clara D Bloomfield, John C Byrd, James S Blachly
doi: http://dx.doi.org/10.1101/022780

Motivation: There are many tools for variant calling and effect prediction, but little to tie together large sample groups. Aggregating, sorting, and summarizing variants and effects across a cohort is often done with ad hoc scripts that must be re-written for every new project. In response, we have written MuCor, a tool to gather variants from a variety of input formats (including multiple files per sample), perform database lookups and frequency calculations, and write many report types. In addition to use in large studies with numerous samples, MuCor can also be employed to directly compare variant calls from the same sample across two or more platforms, parameters, or pipelines. A companion utility, DepthGauge, measures coverage at regions of interest to increase confidence in calls. Availability: Source code is freely available at https://github.com/blachlylab Contact: james.blachly@osumc.edu Supplementary data: Supplementary data, including detailed documentation, are available online.

Adaptive divergence in the bovine genome

Adaptive divergence in the bovine genome

William Barendse, Sean McWilliam, Rowan J Bunch, Blair E Harrison
doi: http://dx.doi.org/10.1101/022764

Cattle diverged during the Pleistocene into two subspecies, one in temperate and one in tropical environments. Here we have used next generation sequencing of the indicine subspecies of cattle and compared it to the taurine subspecies. Although 23.8 million single nucleotide polymorphisms (SNP) were found, the number of fixed amino acid substitutions between the taurine and indicine subspecies was low and consistent with the Haldane predictions for adaptive selection rather than with Neutral Theory. We noted 33 regions of enhanced divergence of nonsynonymous SNP between the subspecies, which included an increased rate of deleterious variants. Signals of positive selection were found for genes associated with immunity, including the Bovine Major Histocompatibility Complex, which also showed an increased rate of deleterious amino acid variants. The genes important in sensing the environment, especially the olfactory system, showed a network wide signal of positive selection.

What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual

What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual

Lynsey K. Whitacre, Polyana C. Tizioto, JaeWoo Kim, Tad S. Sonstegard, Steven G. Schroeder, Leeson J. Alexander, Juan F. Medrano, Robert D. Schnabel, Jeremy F. Taylor, Jared E. Decker
doi: http://dx.doi.org/10.1101/022731

Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain unmapped. We generated de novo assemblies of unmapped reads from the DNA and RNA sequencing of the Bos taurus reference individual and identified the closest matching sequence to each contig by alignment to the NCBI non-redundant nucleotide database using BLAST. As expected, many of these contigs represent vertebrate sequence that is absent, incomplete, or misassembled in the UMD3.1 reference assembly. However, numerous additional contigs represent invertebrate species. Most prominent were several species of Spirurid nematodes and a blood-borne parasite, Babesia bigemina. These species are not known to infect taurine cattle and the reference animal appears to have been host to unsequenced sister species. We demonstrate the importance of exploring unmapped reads to ascertain sequences that are either absent or misassembled in the reference assembly and for detecting sequences indicative of infectious or symbiotic organisms.

Local and sex-specific biases in crossover vs. noncrossover outcomes at meiotic recombination hotspots in mouse

Local and sex-specific biases in crossover vs. noncrossover outcomes at meiotic recombination hotspots in mouse

Esther de Boer, Maria Jasin, Scott Keeney
doi: http://dx.doi.org/10.1101/022830

Meiotic recombination initiated by programmed double-strand breaks (DSBs) yields two types of interhomolog recombination products, crossovers and noncrossovers, but what determines whether a DSB will yield a crossover or noncrossover is not understood. In this study we analyze the influence of sex and chromosomal location on mammalian recombination outcomes by constructing fine-scale recombination maps in both males and females at two mouse hotspots located in different regions of the same chromosome. These include the most comprehensive maps of recombination hotspots in oocytes to date. One hotspot, located centrally on chromosome 1, behaved similarly in male and female meiosis: crossovers and noncrossovers formed at comparable levels and ratios in both sexes. In contrast, at a distal hotspot crossovers were recovered only in males even though noncrossovers were obtained at similar frequencies in both sexes. These findings reveal an example of extreme sex-specific bias in recombination outcome. We further find that estimates of relative DSB levels are surprisingly poor predictors of relative crossover frequencies between hotspots in males. Our results demonstrate that the outcome of mammalian meiotic recombination can be biased, that this bias can vary depending on location and cellular context, and that DSB frequency is not the only determinant of crossover frequency.

A Profile-Based Method for Measuring the Impact of Genetic Variation

A Profile-Based Method for Measuring the Impact of Genetic Variation

Nicole E Wheeler, Lars Barquist, Fatemeh Ashari Ghomi, Robert A Kingsley, Paul P Gardner
doi: http://dx.doi.org/10.1101/022616

Advances in our ability to generate genome sequence data have increased the need for fast, effective approaches to assessing the functional significance of genetic variation. Traditionally, this has been done by identifying single nucleotide polymorphisms within populations, and calculating derived statistics to prioritize candidates, such as dN/dS. However, these methods commonly ignore the differential selective pressure acting at different positions within a given protein sequence and the effect of insertions and deletions (indels). We present a profile-based method for predicting whether a protein sequence variant is likely to have functionally diverged from close relatives, which takes into account differences in residue conservation and indel rates within a sequence. We assess the performance of the method, and apply it to the identification of functionally significant genetic variation between bacterial genomes. We demonstrate that this method is a highly sensitive measure of functional potential, which can improve our understanding of the evolution of proteins and organisms. An implementation can be found at https://github.com/UCanCompBio/deltaBS.

Tools and techniques for computational reproducibility

Tools and techniques for computational reproducibility

Stephen R Piccolo, Adam B Lee, Michael B Frampton
doi: http://dx.doi.org/10.1101/022707

When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. Computers can be programmed to execute analysis tasks, and those programs can be repeated and shared with others. Due to the deterministic nature of most computer programs, the same analysis tasks, applied to the same data, will often produce the same outputs. However, in practice, computational findings often cannot be reproduced, due to complexities in how software is packaged, installed, and executed—and due to limitations in how scientists document analysis steps. Many tools and techniques are available to help overcome these challenges. Here we describe six such strategies. With a broad scientific audience in mind, we describe strengths and limitations of each approach, as well as circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

Tanglegrams: a reduction tool for mathematical phylogenetics

Tanglegrams: a reduction tool for mathematical phylogenetics

Frederick A Matsen IV, Sara Billey, Arnold Kas, Matjaž Konvalinka
(Submitted on 16 Jul 2015)

Many discrete mathematics problems in phylogenetics are defined in terms of the relative labeling of pairs of leaf-labeled trees. These relative labelings are naturally formalized as tanglegrams, which have previously been an object of study in coevolutionary analysis. Although there has been considerable work on planar drawings of tanglegrams, they have not been fully explored as combinatorial objects until recently. In this paper, we describe how many discrete mathematical questions on trees “factor” through a problem on tanglegrams, and how understanding that factoring can simplify analysis. Depending on the problem, it may be useful to consider a unordered version of tanglegrams, and/or their unrooted counterparts. For all of these definitions, we show how the isomorphism types of tanglegrams can be understood in terms of double cosets of the symmetric group, and we investigate their automorphisms. Understanding tanglegrams better will isolate the distinct problems on leaf-labeled pairs of trees and reveal natural symmetries of spaces associated with such problems.

Adaptive variation in human toll-like receptors is contributed by introgression from both Neandertals and Denisovans

Adaptive variation in human toll-like receptors is contributed by introgression from both Neandertals and Denisovans

Michael Dannemann, Aida M. Andrés, Janet Kelso
doi: http://dx.doi.org/10.1101/022699

Pathogens and the diseases they cause have been among the most important selective forces experienced by humans during their evolutionary history. Although adaptive alleles generally arise by mutation, introgression can also be a valuable source of beneficial alleles. Archaic humans, who lived in Europe and Western Asia for over 200,000 years, were likely well-adapted to the environment and its local pathogens, and it is therefore conceivable that modern humans entering Europe and Western Asia who admixed with them obtained a substantial immune advantage from the introgression of archaic alleles. Here we document a cluster of three toll-like receptors (TLR6-TLR1-TLR10) in modern humans that carries three distinct archaic haplotypes, indicating repeated introgression from archaic humans. Two of these haplotypes are most similar to Neandertal genome, while the third haplotype is most similar to the Denisovan genome. The toll-like receptors are key components of innate immunity and provide an important first line of immune defense against bacteria, fungi and parasites. The unusually high allele frequencies and unexpected levels of population differentiation indicate that there has been local positive selection on multiple haplotypes at this locus. We show that the introgressed alleles have clear functional effects in modern humans; archaic-like alleles underlie differences in the expression of the TLR genes and are associated with reduced microbial resistance and increased allergic disease in large cohorts. This provides strong evidence for recurrent adaptive introgression at the TLR6-TLR1-TLR10 locus, resulting in differences in disease phenotypes in modern humans.