Entire genome transcription across evolutionary time exposes non-coding DNA to de novo gene emergence

Entire genome transcription across evolutionary time exposes non-coding DNA to de novo gene emergence
Rafik Neme , Diethard Tautz
doi: http://dx.doi.org/10.1101/017152

Even in the best studied Mammalian genomes, less than 5% of the total genome length is annotated as exonic. However, deep sequencing analysis in humans has shown that around 40% of the genome may be covered by poly-adenylated non-coding transcripts occurring at low levels. Their functional significance is unclear, and there has been a dispute whether they should be considered as noise of the transcriptional machinery. We propose that if such transcripts show some evolutionary stability they will serve as substrates for de novo gene evolution, i.e. gene emergence out of non-coding DNA. Here, we characterize the phylogenetic turnover of low-level poly-adenylated transcripts in a comprehensive sampling of populations, sub-species and species of the genus Mus, spanning a phylogenetic distance of about 10 Myr. We find evidence for more evolutionary stable gains of transcription than losses among closely related taxa, balanced by a loss of older transcripts across the whole phylogeny. We show that adding taxa increases the genomic transcript coverage and that no major transcript-free islands exist over time. This suggests that the entire genome can be transcribed into poly-adenylated RNA when viewed at an evolutionary time scale. Thus, any part of the “non-coding” genome can become subject to evolutionary functionalization via de novo gene evolution.

MMR: A Tool for Read Multi-Mapper Resolution

MMR: A Tool for Read Multi-Mapper Resolution
Andre Kahles , Jonas Behr , Gunnar Rätsch
doi: http://dx.doi.org/10.1101/017103

Motivation: Mapping high throughput sequencing data to a reference genome is an essential step for most analysis pipelines aiming at the computational analysis of genome and transcriptome sequencing data. Breaking ties between equally well mapping locations poses a severe problem not only during the alignment phase, but also has significant impact on the results of downstream analyses. We present the multimapper resolution (MMR) tool that infers optimal mapping locations from the coverage density of other mapped reads. Results: Filtering alignments with MMR can significantly improve the performance of downstream analyses like transcript quantitation and differential testing. We illustrate that the accuracy (Spearman correlation) of transcript quantification increases by 17% when using reads of length 51. In addition, MMR decreases the alignment file sizes by more than 50% and this leads to a reduced running time of the quantification tool. Our efficient implementation of the MMR algorithm is easily applicable as a post-processing step to existing alignment files in BAM format. Its complexity scales linearly with the number of alignments and requires no further inputs. Supplementary Material: Source code and documentation are available for download at http://github.com/ratschlab/mmr. Supplementary text and figures, comprehensive testing results and further information can be found at http://bioweb.me/mmr.

How complexity originates: The evolution of animal eyes

How complexity originates: The evolution of animal eyes
Todd H Oakley , Daniel I Speiser
doi: http://dx.doi.org/10.1101/017129

Learning how complex traits like eyes originate is fundamental for understanding evolution. Here, we first sketch historical perspectives on trait origins and argue that new technologies offer key new insights. Next, we articulate four open questions about trait origins. To address them, we define a research program to break complex traits into components and study the individual evolutionary histories of those parts. By doing so, we can learn when the parts came together and perhaps understand why they stayed together. We apply the approach to five structural innovations critical for complex eyes, reviewing the history of the parts of each of those innovations. Photoreceptors evolved within animals by bricolage, recombining genes that originated far earlier. Multiple genes used in eyes today had ancestral roles in stress responses. We hypothesize that photo-stress could have increased the chance those genes were expressed together in places on animals where light was abundant.

Large-Scale Search of Transcriptomic Read Sets with Sequence Bloom Trees

Large-Scale Search of Transcriptomic Read Sets with Sequence Bloom Trees
Brad Solomon , Carleton Kingsford
doi: http://dx.doi.org/10.1101/017087

Enormous databases of short-read RNA-seq sequencing experiments such as the NIH Sequence Read Archive (SRA) are now available. However, these collections remain difficult to use due to the inability to search for a particular expressed sequence. A natural question is which of these experiments contain sequences that indicate the expression of a particular sequence such as a gene isoform, lncRNA, or uORF. However, at present this is a computationally demanding question at the scale of these databases. We introduce an indexing scheme, the Sequence Bloom Tree (SBT), to support sequence-based querying of terabase-scale collections of thousands of short-read sequencing experiments. We apply SBT to the problem of finding conditions under which query transcripts are expressed. Our experiments are conducted on a set of 2652 publicly available RNA-seq experiments contained in the NIH for the breast, blood, and brain tissues, comprising 5 terabytes of sequence. SBTs of this size can be queried for a 1000 nt sequence in 19 minutes using less than 300 MB of RAM, over 100 times faster than standard usage of SRA-BLAST and 119 times faster than STAR. SBTs allow for fast identification of experiments with expressed novel isoforms, even if these isoforms were unknown at the time the SBT was built. We also provide some theoretical guidance about appropriate parameter selection in SBT and propose a sampling-based scheme for potentially scaling SBT to even larger collections of files. While SBT can handle any set of reads, we demonstrate the effectiveness of SBT by searching a large collection of blood, brain, and breast RNA-seq files for all 214,293 known human transcripts to identify tissue-specific transcripts. The implementation used in the experiments below is in C++ and is available as open source at http://www.cs.cmu.edu/~ckingsf/software/bloomtree.

Adaptation, Clonal Interference, and Frequency-Dependent Interactions in a Long-Term Evolution Experiment with Escherichia coli

Adaptation, Clonal Interference, and Frequency-Dependent Interactions in a Long-Term Evolution Experiment with Escherichia coli

Rohan Maddamsetti , Richard E. Lenski , Jeffrey E. Barrick
doi: http://dx.doi.org/10.1101/017020

Twelve replicate populations of Escherichia coli have been evolving in the laboratory for more than 25 years and 60,000 generations. We analyzed bacteria from whole-population samples frozen every 500 generations through 20,000 generations for one well-studied population, called Ara???1. By tracking 42 known mutations in these samples, we reconstructed the history of this population???s genotypic evolution over this period. The evolutionary dynamics of Ara???1 show strong evidence of selective sweeps as well as clonal interference between competing lineages bearing different beneficial mutations. In some cases, sets of several mutations approached fixation simultaneously, often conveying no information about their order of origination; we present several possible explanations for the existence of these mutational cohorts. Against a backdrop of rapid selective sweeps both earlier and later, we found that two clades coexisted for over 6000 generations before one drove the other extinct. In that time, at least nine mutations arose in the clade that prevailed. We found evidence that the clades evolved a frequency-dependent interaction, which prevented the competitive exclusion of either clade, but which eventually collapsed as beneficial mutations accumulated in the clade that prevailed. Clonal interference and frequency dependence can occur even in the simplest microbial populations. Furthermore, frequency dependence may generate dynamics that extend the period of coexistence that would otherwise be sustained by clonal interference alone.

Threshold trait architecture of Hsp90-buffered variation

Threshold trait architecture of Hsp90-buffered variation

Charles C Carey , Kristen F Gorman , Becky Howsmon , Charles Kooperberg , Aaron K Aragaki , Suzannah Rutherford
doi: http://dx.doi.org/10.1101/016980

Common genetic variants buffered by Hsp90 are candidates for human diseases of signaling such as cancer. Like cancer, morphological abnormalities buffered by Hsp90 are discrete threshold traits with a continuous underlying basis of liability determining their probability of occurrence. QTL and deletion maps for one of the most frequent Hsp90-dependent abnormalities in Drosophila, deformed eye (dfe), were replicated across three genetically related artificial selection lines using strategies dependent on proximity to the dfe threshold and the direction of genetic and environmental effects. Up to 17 dfe loci (QTL) linked by 7 interactions were detected based on the ability of small recombinant regions of an unaffected and completely homozygous control genotype to dominantly suppress or enhance dfe penetrance at its threshold in groups of isogenic recombinant flies, and over 20 deletions increased dfe penetrance from a low expected value in one or more line, identifying a complex network of genes responsible for the dfe phenotype. Replicated comparisons of these whole-genome mapping approaches identified several QTL regions narrowly defined by deletions and 4 candidate genes, with additional uncorrelated QTL and deletions highlighting differences between the approaches and the need for caution in attributing the effect of deletions directly to QTL genes.

RNAseq in the mosquito maxillary palp: a little antennal RNA goes a long way

RNAseq in the mosquito maxillary palp: a little antennal RNA goes a long way

David C. Rinker , Xiaofan Zhou , Ronald Jason Pitts , Antonis Rokas , LJ Zwiebel
doi: http://dx.doi.org/10.1101/016998

A comparative transcriptomic study of mosquito olfactory tissues recently published in BMC Genomics (Hodges et al., 2014) reported several novel findings that have broad implications for the field of insect olfaction. In this brief commentary, we outline why the conclusions of Hodges et al. are problematic under the current models of insect olfaction and then contrast their findings with those of other RNAseq based studies of mosquito olfactory tissues. We also generated a new RNAseq data set from the maxillary palp of Anopheles gambiae in an effort to replicate the novel results of Hodges et al. but were unable to reproduce their results. Instead, our new RNAseq data support the more straightforward explanation that the novel findings of Hodges et al. were a consequence of contamination by antennal RNA. In summary, we find strong evidence to suggest that the conclusions of Hodges et al were spurious, and that at least some of their RNAseq data sets were irrevocably compromised by cross-contamination between samples.

Selective strolls: fixation and extinction in diploids are slower for weakly selected mutations than for neutral ones

Selective strolls: fixation and extinction in diploids are slower for weakly selected mutations than for neutral ones

fabrizio mafessoni , Michael Lachmann
doi: http://dx.doi.org/10.1101/016881

In finite populations, an allele disappears or reaches fixation due to two main forces, selection and drift. Selec- tion is generally thought to accelerate the process: a selected mutation will reach fixation faster than a neutral one, and a disadvantageous one will quickly disappear from the population. We show that even in simple diploid populations, this is often not true. Dominance and recessivity unexpectedly slow down the evolutionary process for weakly selected alleles. In particular, slightly advantageous dominant and mildly deleterious recessive mu- tations reach fixation more slowly than neutral ones. This phenomenon determines genetic signatures opposite to those expected under strong selection, such as increased instead of decreased genetic diversity around the selected site. Furthermore, we characterize a new phenomenon: mildly deleterious recessive alleles, thought to represent the vast majority of newly arising mutations, survive in a population longer than neutral ones, before getting lost. Hence, natural selection is less effective than previously thought in getting rid rapidly of slightly negative mutations, contributing their observed persistence in present populations. Consequently, low frequency slightly deleterious mutations are on average older than neutral ones.

Variation in rural African gut microbiomes is strongly shaped by parasitism and diet

Variation in rural African gut microbiomes is strongly shaped by parasitism and diet

Elise R Morton , Joshua Lynch , Alain Froment , Sophie Lafosse , Evelyne Heyer , Molly Przeworski , Ran Blekhman , Laure Segurel
doi: http://dx.doi.org/10.1101/016949

The human gut microbiome is influenced by its host’s nutrition and health status, and represents an interesting adaptive phenotype under the influence of metabolic and immune constraints. Previous studies contrasting rural populations in developing countries to urban industrialized ones have shown that geography is an important factor associated with the gut microbiome; however, studies have yet to disentangle the effects of factors such as climate, diet, host genetics, hygiene and parasitism. Here, we focus on fine-scale comparisons of African rural populations in order to (i) contrast the gut microbiomes of populations that inhabit similar environments but have different traditional subsistence modes and (ii) evaluate the effect of parasitism on microbiome composition and structure. We sampled rural Pygmy hunter-gatherers as well as Bantu individuals from both farming and fishing populations in Southwest Cameroon and found that the presence of Entamoeba is strongly correlated with microbial composition and diversity. Using a random forest classifier model, we show that an individual’s infection status can be predicted with 79% accuracy based on his/her gut microbiome composition. We identified multiple taxa that differ significantly in frequency between infected and uninfected individuals, and found that alpha diversity is significantly higher in infected individuals, while beta-diversity is reduced. Subsistence mode was another factor significantly associated with microbial composition, notably with some taxa previously shown to differ between Hadza East African hunter-gatherers and Italians also discriminating Pygmy hunter-gatherers from neighboring farming or fishing populations in Cameroon. In conclusion, these results provide evidence for a strong relationship between human gut parasites and the microbiome, and highlight how sensitive this microbial ecosystem is to subtle changes in host nutrition.

The origins of a novel butterfly wing patterning gene from within a family of conserved cell cycle regulators

The origins of a novel butterfly wing patterning gene from within a family of conserved cell cycle regulators

Nicola Nadeau , Carolina Pardo-Diaz , Annabel Whibley , Megan Ann Supple , Richard Wallbank , Grace C. Wu , Luana Maroja , Laura Ferguson , Heather Hines , Camilo Salazar , Richard ffrench-Constant , Mathieu Joron , William Owen McMillan , Chris Jiggins
doi: http://dx.doi.org/10.1101/016006

A major challenge in evolutionary biology is to understand the origins of novel structures. The wing patterns of butterflies and moths are derived phenotypes unique to the Lepidoptera. Here we identify a gene that we name poikilomousa (poik), which regulates colour pattern switches in the mimetic Heliconius butterflies. Strong associations between phenotypic variation and DNA sequence variation are seen in three different Heliconius species, in addition to associations between gene expression and colour pattern. Colour pattern variants are also associated with differences in splicing of poik transcripts. poik is a member of the conserved fizzy family of cell cycle regulators. It belongs to a faster evolving subfamily, the closest functionally characterised orthologue being the cortex gene in Drosophila, a female germ-line specific protein involved in meiosis. poik appears to have adopted a novel function in the Lepidoptera and become a major target for natural selection acting on colour and pattern variation in this group.