Plant reproductive development is characterised by a transcriptomic evolutionary bulge

Plant reproductive development is characterised by a transcriptomic evolutionary bulge

Toni I Gossmann, Dounia Saleh, Marc W Schmid, Michael A Spence, Karl Schmid
doi: http://dx.doi.org/10.1101/022939

Reproductive traits in plants tend to evolve rapidly due to various causes that include plant-pollinator coevolution and pollen competition, but the genomic basis of reproductive trait evolution is still largely unknown. To characterise evolutionary patterns of genome wide gene expression in reproductive tissues and to compare them to developmental stages of the sporophyte, we analysed evolutionary conservation and genetic diversity of protein-coding genes using microarray-based transcriptome data from three plant species, Arabidopsis thaliana, rice (Oryza sativa) and soybean (Glycine max). In all three species a significant shift in gene expression occurs during gametogenesis in which genes of younger evolutionary age and higher genetic diversity contribute significantly more to the transcriptome than in other stages. We refer to this phenomenon as ‘evolutionary bulge” during plant reproductive development because it differentiates the gametophyte from the sporophyte. The extent of the bulge pattern is much stronger than the transcriptomic hourglass, which postulates that during early embryo development an increased proportion of ancient and conserved genes contribute to the total transcriptome. In the three plant species, we observed an hourglass pattern only in A. thaliana but not in rice or soybean, which suggests that unlike the evolutionary bulge of reproductive genes the transcriptomic hourglass is not a general pattern of plant embryogenesis, which is consistent with the absence of a morphologically defined phylotypic stage in plant development

First-step Mutations during Adaptation to Thermal Stress Shift the Expression of Thousands of Genes Back toward the Pre-stressed State

First-step Mutations during Adaptation to Thermal Stress Shift the Expression of Thousands of Genes Back toward the Pre-stressed State

Alejandra Rodriguez-Verdugo, Olivier Tenaillon, Brandon Gaut
doi: http://dx.doi.org/10.1101/022905

The temporal change of phenotypes during the adaptive process remain largely unexplored, as do the genetic changes that affect these phenotypic changes. Here we focused on three mutations that rose to high frequency in the early stages of adaptation within 12 Escherichia coli populations subjected to thermal stress (42°C). All of the mutations were in the rpoB gene, which encodes the RNA polymerase beta subunit. For each mutation, we measured the growth curves and gene expression (mRNAseq) of clones at 42°C. We also compared growth and gene expression to their ancestor under unstressed (37°C) and stressed conditions (42°C). Each of the three mutations changed the expression of hundreds of genes and conferred large fitness advantages, apparently through the restoration of global gene expression from the stressed towards the pre-stressed state. Finally, we compared the phenotypic characteristics of one mutant, I572L, to two high-temperature adapted clones that have this mutation plus additional background mutations. The background mutations increased fitness, but they did not substantially change gene expression. We conclude that early mutations in a global transcriptional regulator cause extensive changes in gene expression, many of which are likely under positive selection for their effect in restoring the pre-stress physiology.

Purging of deleterious variants in Italian founder populations with extended autozygosity

Purging of deleterious variants in Italian founder populations with extended autozygosity

Massimiliano Cocca, Marc Pybus, Pier Francesco Palamara, Erik Garrison, Michela Traglia, Cinzia F Sala, Sheila Ulivi, Yasin Memari, Anja Kolb-Kokocinski, Richard Durbin, Paolo Gasparini, Daniela Toniolo, Nicole Soranzo, Vincenza Colonna
doi: http://dx.doi.org/10.1101/022947

Purging through inbreeding defines the process through which deleterious alleles can be removed from populations by natural selection when exposed in homozygosis through the occurrence of consanguineous marriage. In this study we carried out low-read depth (4-10x) whole-genome sequencing in 568 individuals from three Italian founder populations, and compared it to data from other Italian and European populations from the 1000 Genomes Project. We show depletion of homozygous genotypes at potentially detrimental sites in the founder populations compared to outbred populations and observe patterns consistent with consanguinity driving the accelerated purging of highly deleterious mutations.

Author post: Adaptive evolution is substantially impeded by Hill-Robertson interference in Drosophila

This guest post is by David Castellano and Adam Eyre-Walker on their preprint (with co-authors) Adaptive evolution is substantially impeded by Hill-Robertson interference in Drosophila.

Our paper “Adaptive evolution is substantially impeded by Hill-Robertson interference in Drosophila”, in which we investigate the role of both the rate of recombination and the mutation rate on the rate of adaptive amino acid substitutions, has been available at biorxiv (http://dx.doi.org/10.1101/021600) since 27 June.

Population genetics theory predicts that the rate of adaptive evolution should depend upon the rate of recombination; genes with low rates of recombination will suffer from Hill-Robertson interference (HRi) in which selected mutations interfere with each other (see the figure below): a newly arising advantageous mutation may find itself in competition for fixation with another advantageous mutation at a linked locus on another chromosome in the population, or in linkage disequilibrium with deleterious mutations, which will reduce its probability of fixation if it can not recombine away from them.

A schematic HRi example among adaptive alleles (left) and among adaptive and deleterious alleles (right).

A schematic HRi example among adaptive alleles (left) and among adaptive and deleterious alleles (right).

Likewise, it is expected that genes with higher mutation rates will undergo more adaptive evolution than genes with low mutation rates. More interestingly an interaction between the rate of recombination and the rate of mutation is also expected; HRi should be more prevalent in genes with high mutation rates and low rates of recombination. No attempt has been done so far to quantify the overall impact of HRi on the rate of adaptive evolution for any given genome. In our paper we propose a way to quantify the number of adaptive substitutions lost due to HRi – approximately 27% of all adaptive mutations, which would go to fixation since the split of D. melanogaster – D. yakuba if there was free recombination, are lost due to HRi. Moreover, we are able to estimate how the fraction of lost adaptive amino acid substitutions to HRi depends on gene’s mutation rate. In agreement with our expectations, genes with high mutation rates lose a significantly higher proportion of adaptive substitutions than genes with low mutation rates (43% vs 11%, respectively).

An open question is to what extent HRi affects rates of adaptive evolution in other species. Moreover, the loss of adaptive substitutions to HRi can potentially tell us something important about the strength of selection acting on some advantageous mutations, since weakly selected mutations are those that are most likely to be affected by HRi. This will require further analysis and population genetic modeling, but in combination with other sources of information, for example, the dip in diversity around non-synonymous substitutions, the site frequency spectrum the high frequency variants that are left by selective sweeps it may be possible to infer much more about the DFE of advantageous mutations than previously thought.

It will be of great interest to do similar analyses to those performed here in other species.

Comments very welcome!
David and Adam

MuCor: Mutation Aggregation and Correlation

MuCor: Mutation Aggregation and Correlation

Karl W Kroll, Ann-Katherin Eisfeld, Gerard Lozanski, Clara D Bloomfield, John C Byrd, James S Blachly
doi: http://dx.doi.org/10.1101/022780

Motivation: There are many tools for variant calling and effect prediction, but little to tie together large sample groups. Aggregating, sorting, and summarizing variants and effects across a cohort is often done with ad hoc scripts that must be re-written for every new project. In response, we have written MuCor, a tool to gather variants from a variety of input formats (including multiple files per sample), perform database lookups and frequency calculations, and write many report types. In addition to use in large studies with numerous samples, MuCor can also be employed to directly compare variant calls from the same sample across two or more platforms, parameters, or pipelines. A companion utility, DepthGauge, measures coverage at regions of interest to increase confidence in calls. Availability: Source code is freely available at https://github.com/blachlylab Contact: james.blachly@osumc.edu Supplementary data: Supplementary data, including detailed documentation, are available online.

Adaptive divergence in the bovine genome

Adaptive divergence in the bovine genome

William Barendse, Sean McWilliam, Rowan J Bunch, Blair E Harrison
doi: http://dx.doi.org/10.1101/022764

Cattle diverged during the Pleistocene into two subspecies, one in temperate and one in tropical environments. Here we have used next generation sequencing of the indicine subspecies of cattle and compared it to the taurine subspecies. Although 23.8 million single nucleotide polymorphisms (SNP) were found, the number of fixed amino acid substitutions between the taurine and indicine subspecies was low and consistent with the Haldane predictions for adaptive selection rather than with Neutral Theory. We noted 33 regions of enhanced divergence of nonsynonymous SNP between the subspecies, which included an increased rate of deleterious variants. Signals of positive selection were found for genes associated with immunity, including the Bovine Major Histocompatibility Complex, which also showed an increased rate of deleterious amino acid variants. The genes important in sensing the environment, especially the olfactory system, showed a network wide signal of positive selection.

What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual

What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual

Lynsey K. Whitacre, Polyana C. Tizioto, JaeWoo Kim, Tad S. Sonstegard, Steven G. Schroeder, Leeson J. Alexander, Juan F. Medrano, Robert D. Schnabel, Jeremy F. Taylor, Jared E. Decker
doi: http://dx.doi.org/10.1101/022731

Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain unmapped. We generated de novo assemblies of unmapped reads from the DNA and RNA sequencing of the Bos taurus reference individual and identified the closest matching sequence to each contig by alignment to the NCBI non-redundant nucleotide database using BLAST. As expected, many of these contigs represent vertebrate sequence that is absent, incomplete, or misassembled in the UMD3.1 reference assembly. However, numerous additional contigs represent invertebrate species. Most prominent were several species of Spirurid nematodes and a blood-borne parasite, Babesia bigemina. These species are not known to infect taurine cattle and the reference animal appears to have been host to unsequenced sister species. We demonstrate the importance of exploring unmapped reads to ascertain sequences that are either absent or misassembled in the reference assembly and for detecting sequences indicative of infectious or symbiotic organisms.

Local and sex-specific biases in crossover vs. noncrossover outcomes at meiotic recombination hotspots in mouse

Local and sex-specific biases in crossover vs. noncrossover outcomes at meiotic recombination hotspots in mouse

Esther de Boer, Maria Jasin, Scott Keeney
doi: http://dx.doi.org/10.1101/022830

Meiotic recombination initiated by programmed double-strand breaks (DSBs) yields two types of interhomolog recombination products, crossovers and noncrossovers, but what determines whether a DSB will yield a crossover or noncrossover is not understood. In this study we analyze the influence of sex and chromosomal location on mammalian recombination outcomes by constructing fine-scale recombination maps in both males and females at two mouse hotspots located in different regions of the same chromosome. These include the most comprehensive maps of recombination hotspots in oocytes to date. One hotspot, located centrally on chromosome 1, behaved similarly in male and female meiosis: crossovers and noncrossovers formed at comparable levels and ratios in both sexes. In contrast, at a distal hotspot crossovers were recovered only in males even though noncrossovers were obtained at similar frequencies in both sexes. These findings reveal an example of extreme sex-specific bias in recombination outcome. We further find that estimates of relative DSB levels are surprisingly poor predictors of relative crossover frequencies between hotspots in males. Our results demonstrate that the outcome of mammalian meiotic recombination can be biased, that this bias can vary depending on location and cellular context, and that DSB frequency is not the only determinant of crossover frequency.

A Profile-Based Method for Measuring the Impact of Genetic Variation

A Profile-Based Method for Measuring the Impact of Genetic Variation

Nicole E Wheeler, Lars Barquist, Fatemeh Ashari Ghomi, Robert A Kingsley, Paul P Gardner
doi: http://dx.doi.org/10.1101/022616

Advances in our ability to generate genome sequence data have increased the need for fast, effective approaches to assessing the functional significance of genetic variation. Traditionally, this has been done by identifying single nucleotide polymorphisms within populations, and calculating derived statistics to prioritize candidates, such as dN/dS. However, these methods commonly ignore the differential selective pressure acting at different positions within a given protein sequence and the effect of insertions and deletions (indels). We present a profile-based method for predicting whether a protein sequence variant is likely to have functionally diverged from close relatives, which takes into account differences in residue conservation and indel rates within a sequence. We assess the performance of the method, and apply it to the identification of functionally significant genetic variation between bacterial genomes. We demonstrate that this method is a highly sensitive measure of functional potential, which can improve our understanding of the evolution of proteins and organisms. An implementation can be found at https://github.com/UCanCompBio/deltaBS.

Tools and techniques for computational reproducibility

Tools and techniques for computational reproducibility

Stephen R Piccolo, Adam B Lee, Michael B Frampton
doi: http://dx.doi.org/10.1101/022707

When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. Computers can be programmed to execute analysis tasks, and those programs can be repeated and shared with others. Due to the deterministic nature of most computer programs, the same analysis tasks, applied to the same data, will often produce the same outputs. However, in practice, computational findings often cannot be reproduced, due to complexities in how software is packaged, installed, and executed—and due to limitations in how scientists document analysis steps. Many tools and techniques are available to help overcome these challenges. Here we describe six such strategies. With a broad scientific audience in mind, we describe strengths and limitations of each approach, as well as circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.