A simple biophysical model predicts more rapid accumulation of hybrid incompatibilities in small populations

A simple biophysical model predicts more rapid accumulation of hybrid incompatibilities in small populations
Bhavin S. Khatri, Richard A. Goldstein
Comments: 13 pages, 6 figures
Subjects: Populations and Evolution (q-bio.PE)

Speciation is fundamental to the huge diversity of life on Earth. Evidence suggests reproductive isolation arises most commonly in allopatry with a higher speciation rate in small populations. Current theory does not address this dependence in the important weak mutation regime. Here, we examine a biophysical model of speciation based on the binding of a protein transcription factor to a DNA binding site, and how their independent co-evolution, in a stabilizing landscape, of two allopatric lineages leads to incompatibilities. Our results give a new prediction for the monomorphic regime of evolution, consistent with data, that smaller populations should develop incompatibilities more quickly. This arises as: 1) smaller populations having a greater initial drift load, as there are more sequences that bind poorly than well, so fewer substitutions are needed to reach incompatible regions of phenotype space; 2) slower divergence when the population size is larger than the inverse of discrete differences in fitness. Further, we find longer sequences develop incompatibilities more quickly at small population sizes, but more slowly at large population sizes. The biophysical model thus represents a robust mechanism of rapid reproductive isolation for small populations and large sequences, that does not require peak-shifts or positive selection.

Analysis of adaptive walks on NK fitness landscapes with different interaction schemes

Analysis of adaptive walks on NK fitness landscapes with different interaction schemes
Stefan Nowak, Joachim Krug
Comments: 29 pages, 9 figures
Subjects: Populations and Evolution (q-bio.PE); Disordered Systems and Neural Networks (cond-mat.dis-nn)

Fitness landscapes are genotype to fitness mappings commonly used in evolutionary biology and computer science which are closely related to spin glass models. In this paper, we study the NK model for fitness landscapes where the interaction scheme between genes can be explicitly defined. The focus is on how this scheme influences the overall shape of the landscape. Our main tool for the analysis are adaptive walks, an idealized dynamics by which the population moves uphill in fitness and terminates at a local fitness maximum. We use three different types of walks and investigate how their length (the number of steps required to reach a local peak) and height (the fitness at the endpoint of the walk) depend on the dimensionality and structure of the landscape. We find that the distribution of local maxima over the landscape is particularly sensitive to the choice of interaction pattern. Most quantities that we measure are simply correlated to the rank of the scheme, which is equal to the number of nonzero coefficients in the expansion of the fitness landscape in terms of Walsh functions.

Entire genome transcription across evolutionary time exposes non-coding DNA to de novo gene emergence

Entire genome transcription across evolutionary time exposes non-coding DNA to de novo gene emergence
Rafik Neme , Diethard Tautz
doi: http://dx.doi.org/10.1101/017152

Even in the best studied Mammalian genomes, less than 5% of the total genome length is annotated as exonic. However, deep sequencing analysis in humans has shown that around 40% of the genome may be covered by poly-adenylated non-coding transcripts occurring at low levels. Their functional significance is unclear, and there has been a dispute whether they should be considered as noise of the transcriptional machinery. We propose that if such transcripts show some evolutionary stability they will serve as substrates for de novo gene evolution, i.e. gene emergence out of non-coding DNA. Here, we characterize the phylogenetic turnover of low-level poly-adenylated transcripts in a comprehensive sampling of populations, sub-species and species of the genus Mus, spanning a phylogenetic distance of about 10 Myr. We find evidence for more evolutionary stable gains of transcription than losses among closely related taxa, balanced by a loss of older transcripts across the whole phylogeny. We show that adding taxa increases the genomic transcript coverage and that no major transcript-free islands exist over time. This suggests that the entire genome can be transcribed into poly-adenylated RNA when viewed at an evolutionary time scale. Thus, any part of the “non-coding” genome can become subject to evolutionary functionalization via de novo gene evolution.

MMR: A Tool for Read Multi-Mapper Resolution

MMR: A Tool for Read Multi-Mapper Resolution
Andre Kahles , Jonas Behr , Gunnar R├Ątsch
doi: http://dx.doi.org/10.1101/017103

Motivation: Mapping high throughput sequencing data to a reference genome is an essential step for most analysis pipelines aiming at the computational analysis of genome and transcriptome sequencing data. Breaking ties between equally well mapping locations poses a severe problem not only during the alignment phase, but also has significant impact on the results of downstream analyses. We present the multimapper resolution (MMR) tool that infers optimal mapping locations from the coverage density of other mapped reads. Results: Filtering alignments with MMR can significantly improve the performance of downstream analyses like transcript quantitation and differential testing. We illustrate that the accuracy (Spearman correlation) of transcript quantification increases by 17% when using reads of length 51. In addition, MMR decreases the alignment file sizes by more than 50% and this leads to a reduced running time of the quantification tool. Our efficient implementation of the MMR algorithm is easily applicable as a post-processing step to existing alignment files in BAM format. Its complexity scales linearly with the number of alignments and requires no further inputs. Supplementary Material: Source code and documentation are available for download at http://github.com/ratschlab/mmr. Supplementary text and figures, comprehensive testing results and further information can be found at http://bioweb.me/mmr.

How complexity originates: The evolution of animal eyes

How complexity originates: The evolution of animal eyes
Todd H Oakley , Daniel I Speiser
doi: http://dx.doi.org/10.1101/017129

Learning how complex traits like eyes originate is fundamental for understanding evolution. Here, we first sketch historical perspectives on trait origins and argue that new technologies offer key new insights. Next, we articulate four open questions about trait origins. To address them, we define a research program to break complex traits into components and study the individual evolutionary histories of those parts. By doing so, we can learn when the parts came together and perhaps understand why they stayed together. We apply the approach to five structural innovations critical for complex eyes, reviewing the history of the parts of each of those innovations. Photoreceptors evolved within animals by bricolage, recombining genes that originated far earlier. Multiple genes used in eyes today had ancestral roles in stress responses. We hypothesize that photo-stress could have increased the chance those genes were expressed together in places on animals where light was abundant.

Large-Scale Search of Transcriptomic Read Sets with Sequence Bloom Trees

Large-Scale Search of Transcriptomic Read Sets with Sequence Bloom Trees
Brad Solomon , Carleton Kingsford
doi: http://dx.doi.org/10.1101/017087

Enormous databases of short-read RNA-seq sequencing experiments such as the NIH Sequence Read Archive (SRA) are now available. However, these collections remain difficult to use due to the inability to search for a particular expressed sequence. A natural question is which of these experiments contain sequences that indicate the expression of a particular sequence such as a gene isoform, lncRNA, or uORF. However, at present this is a computationally demanding question at the scale of these databases. We introduce an indexing scheme, the Sequence Bloom Tree (SBT), to support sequence-based querying of terabase-scale collections of thousands of short-read sequencing experiments. We apply SBT to the problem of finding conditions under which query transcripts are expressed. Our experiments are conducted on a set of 2652 publicly available RNA-seq experiments contained in the NIH for the breast, blood, and brain tissues, comprising 5 terabytes of sequence. SBTs of this size can be queried for a 1000 nt sequence in 19 minutes using less than 300 MB of RAM, over 100 times faster than standard usage of SRA-BLAST and 119 times faster than STAR. SBTs allow for fast identification of experiments with expressed novel isoforms, even if these isoforms were unknown at the time the SBT was built. We also provide some theoretical guidance about appropriate parameter selection in SBT and propose a sampling-based scheme for potentially scaling SBT to even larger collections of files. While SBT can handle any set of reads, we demonstrate the effectiveness of SBT by searching a large collection of blood, brain, and breast RNA-seq files for all 214,293 known human transcripts to identify tissue-specific transcripts. The implementation used in the experiments below is in C++ and is available as open source at http://www.cs.cmu.edu/~ckingsf/software/bloomtree.

Repeatability of evolution on epistatic landscapes

Repeatability of evolution on epistatic landscapes
Benedikt Bauer , Chaitanya S Gokhale
doi: http://dx.doi.org/10.1101/016782

Evolution is a dynamic process. The two classical forces of evolution are mutation and selection. Assuming small mutation rates, evolution can be predicted based solely on the fitness differences between phenotypes. Predicting an evolutionary process under varying mutation rates as well as varying fitness is still an open question. Experimental procedures, however, do include these complexities along with fluctuating population sizes and stochastic events such as extinctions. We investigate the mutational path probabilities of systems having epistatic effects on both fitness and mutation rates using a theoretical and computational framework. In contrast to previous models, we do not limit ourselves to the typical strong selection, weak mutation (SSWM)-regime or to fixed population sizes. Rather we allow epistatic interactions to also affect mutation rates. This can lead to qualitatively non-trivial dynamics. Pathways, that are negligible in the SSWM-regime, can overcome fitness valleys and become accessible. This finding has the potential to extend the traditional predictions based on the SSWM foundation and bring us closer to what is observed in experimental systems.