Methods for distinguishing between protein-coding and long noncoding RNAs and the elusive biological purpose of translation of long noncoding RNAs

Methods for distinguishing between protein-coding and long noncoding RNAs and the elusive biological purpose of translation of long noncoding RNAs
Gali Housman , Igor Ulitsky
doi: http://dx.doi.org/10.1101/017889

Long noncoding RNAs (lncRNAs) are a diverse class of RNAs with increasingly appreciated functions in vertebrates, yet much of their biology remains poorly understood. In particular, it is unclear to what extent the current catalog of over 10,000 distinct annotated lncRNAs is indeed devoid of genes coding for proteins. Here we review the available computational and experimental schemes for distinguishing between recent genome-wide applications. We conclude that the model most consistent with available data is that a large number of mammalian lncRNAs undergo translation, but only a very small minority of such translation events result in stable and functional peptides. The outcome of the majority of the translation events and their potential biological purposes remain an intriguing topic for future investigation.

Predicting Carriers of Ongoing Selective Sweeps Without Knowledge of the Favored Allele

Predicting Carriers of Ongoing Selective Sweeps Without Knowledge of the Favored Allele
Roy Ronen , Glenn Tesler , Ali Akbari , Shay Zakov , Noah A Rosenberg , Vineet Bafna

Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory — for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

Adaptive evolution of anti-viral siRNAi genes in bumblebees

Adaptive evolution of anti-viral siRNAi genes in bumblebees
Sophie Helbing , Michael Lattorff
doi: http://dx.doi.org/10.1101/017681

The high density of frequently interacting and closely related individuals in social insects enhance pathogen transmission and establishment within colonies. Group-mediated behavior supporting immune defenses tend to decrease selection acting on immune genes. Along with low effective population sizes this will result in relaxed constraint and rapid evolution of genes of the immune system. Here we show that sociality is the main driver of selection in antiviral siRNAi genes in social bumblebees compared to their socially parasitic cuckoo bumblebees that lack a worker caste. RNAi genes show frequent positive selection at the codon level additionally supported by the occurrence of parallel evolution and their evolutionary rate is linked to their pathway specific position with genes directly interacting with viruses showing the highest rates of molecular evolution. We suggest that indeed higher pathogen load in social insects drive adaptive evolution of immune genes, if not compensated by behavior.

Simple genetic models for autism spectrum disorder

Simple genetic models for autism spectrum disorder
Swagatam Mukhopadhyay , Michael Wigler , Dan Levy
doi: http://dx.doi.org/10.1101/017301

To explore the interplay between new mutation, transmission, and gender bias in genetic disease requires formal quantitative modeling. Autism spectrum disorders offer an ideal case: they are genetic in origin, complex, and show a gender bias. The high reproductive costs of autism ensure that most strongly associated genetic mutations are short-lived, and indeed the disease exhibits both transmitted and de novo components. There is a large body of both epidemiologic and genomic data that greatly constrain the genetic mechanisms that may contribute to the disorder. We develop a computational framework that assumes classes of additive variants, each member of a class having equal effect. We restrict our initial exploration to single class models, each having three parameters. Only one model matches epidemiological data. It also independently matches the incidence of de novo mutation in simplex families, the gender bias in unaffected siblings in simplex populations, and rates of mutation in target genes. This model makes strong and as yet not fully tested predictions, namely that females are the primary carriers in cases of genetic transmission, and that the incidence of de novo mutation in target genes for families at high risk for autism are not especially elevated. In its simplicity, this model does not account for MZ twin concordance or the distorted gender bias of high functioning children with ASD, and does not accommodate all the known mechanisms contributing to ASD. We point to the next steps in applying the same computational framework to explore more complex models.

Genetic variability under the seed bank coalescent

Genetic variability under the seed bank coalescent
Jochen Blath , Bjarki Eldon , Adrian Casanova , Noemi Kurt , Maite Wilke-Berenguer
doi: http://dx.doi.org/10.1101/017244

We analyse patterns of genetic variability of populations in the presence of a large seed bank with the help of a new coalescent structure called the seed bank coalescent. This ancestral process appears naturally as scaling limit of the genealogy of large populations that sustain seed banks, if the seed bank size and individual dormancy times are of the same order as the active population. Mutations appear as Poisson processes on the active lineages, and potentially at reduced rate also on the dormant lineages. The presence of `dormant’ lineages leads to qualitatively altered times to the most recent common ancestor and non-classical patterns of genetic diversity. To illustrate this we provide a Wright-Fisher model with seed bank component and mutation, motivated from recent models of microbial dormancy, whose genealogy can be described by the seed bank coalescent. Based on our coalescent model, we derive recursions for the expectation and variance of the time to most recent common ancestor, number of segregating sites, pairwise differences, and singletons. Estimates (obtained by simulations) of the distributions of commonly employed distance statistics, in the presence and absence of a seed bank, are compared. The effect of a seed bank on the expected site-frequency spectrum is also investigated using simulations. Our results indicate that the presence of a large seed bank considerably alters the distribution of some distance statistics, as well as the site-frequency spectrum. Thus, one should be able to detect the presence of a large seed bank in genetic data.

A simple biophysical model predicts more rapid accumulation of hybrid incompatibilities in small populations

A simple biophysical model predicts more rapid accumulation of hybrid incompatibilities in small populations
Bhavin S. Khatri, Richard A. Goldstein
Comments: 13 pages, 6 figures
Subjects: Populations and Evolution (q-bio.PE)

Speciation is fundamental to the huge diversity of life on Earth. Evidence suggests reproductive isolation arises most commonly in allopatry with a higher speciation rate in small populations. Current theory does not address this dependence in the important weak mutation regime. Here, we examine a biophysical model of speciation based on the binding of a protein transcription factor to a DNA binding site, and how their independent co-evolution, in a stabilizing landscape, of two allopatric lineages leads to incompatibilities. Our results give a new prediction for the monomorphic regime of evolution, consistent with data, that smaller populations should develop incompatibilities more quickly. This arises as: 1) smaller populations having a greater initial drift load, as there are more sequences that bind poorly than well, so fewer substitutions are needed to reach incompatible regions of phenotype space; 2) slower divergence when the population size is larger than the inverse of discrete differences in fitness. Further, we find longer sequences develop incompatibilities more quickly at small population sizes, but more slowly at large population sizes. The biophysical model thus represents a robust mechanism of rapid reproductive isolation for small populations and large sequences, that does not require peak-shifts or positive selection.

Analysis of adaptive walks on NK fitness landscapes with different interaction schemes

Analysis of adaptive walks on NK fitness landscapes with different interaction schemes
Stefan Nowak, Joachim Krug
Comments: 29 pages, 9 figures
Subjects: Populations and Evolution (q-bio.PE); Disordered Systems and Neural Networks (cond-mat.dis-nn)

Fitness landscapes are genotype to fitness mappings commonly used in evolutionary biology and computer science which are closely related to spin glass models. In this paper, we study the NK model for fitness landscapes where the interaction scheme between genes can be explicitly defined. The focus is on how this scheme influences the overall shape of the landscape. Our main tool for the analysis are adaptive walks, an idealized dynamics by which the population moves uphill in fitness and terminates at a local fitness maximum. We use three different types of walks and investigate how their length (the number of steps required to reach a local peak) and height (the fitness at the endpoint of the walk) depend on the dimensionality and structure of the landscape. We find that the distribution of local maxima over the landscape is particularly sensitive to the choice of interaction pattern. Most quantities that we measure are simply correlated to the rank of the scheme, which is equal to the number of nonzero coefficients in the expansion of the fitness landscape in terms of Walsh functions.