Long non-coding RNA discovery in Anopheles gambiae using deep RNA sequencing

Long non-coding RNA discovery in Anopheles gambiae using deep RNA sequencing

Adam M Jenkins, Robert M Waterhouse, Alan S Kopin, Marc A.T. Muskavitch

Long non-coding RNAs (lncRNAs) are mRNA-like transcripts longer than 200 bp that have no protein-coding potential. lncRNAs have recently been implicated in epigenetic regulation, transcriptional and post-transcriptional gene regulation, and regulation of genomic stability in mammals, Caenorhabditis elegans, and Drosophila melanogaster. Using deep RNA sequencing of multiple Anopheles gambiae life stages, we have identified over 600 novel lncRNAs and more than 200 previously unannotated putative protein-coding genes. The lncRNAs exhibit differential expression profiles across life stages and adult genders. Those lncRNAs that are antisense to known protein-coding genes or are contained within intronic regions of protein-coding genes may mediate transcriptional repression or stabilization of associated mRNAs. lncRNAs exhibit faster rates of sequence evolution across anophelines compared to previously known and newly identified protein-coding genes. This initial description of lncRNAs in An. gambiae offers the first genome-wide insights into long non-coding RNAs in this vector mosquito and defines a novel set of potential targets for the development of vector-based interventions that may curb the human malaria burden in disease-endemic countries.

Comparative Performance of Two Whole Genome Capture Methodologies on Ancient DNA Illumina Libraries

Comparative Performance of Two Whole Genome Capture Methodologies on Ancient DNA Illumina Libraries
Maria Avila-Arcos, Marcela Sandoval-Velasco, Hannes Schroeder, Meredith L Carpenter, Anna-Sapfo Malaspinas, Nathan Wales, Fernando PeƱaloza, Carlos D Bustamante, M. Thomas P Gilbert

1. The application of whole genome capture (WGC) methods to ancient DNA (aDNA) promises to increase the efficiency of ancient genome sequencing. 2. We compared the performance of two recently developed WGC methods in enriching human aDNA within Illumina libraries built using both double-stranded (DSL) and single-stranded (SSL) build protocols. Although both methods effectively enriched aDNA, one consistently produced marginally better results, giving us the opportunity to further explore the parameters influencing WGC experiments. 3. Our results suggest that bait length has an important influence on library enrichment. Moreover, we show that WGC biases against the shorter molecules that are enriched in SSL preparation protocols. Therefore application of WGC to such samples is not recommended without future optimization. Lastly, we document the effect of WGC on other features including clonality, GC composition and repetitive DNA content of captured libraries. 4. Our findings provide insights for researchers planning to perform WGC on aDNA, and suggest future tests and optimization to improve WGC efficiency.

Author post: Facilitated diffusion buffers noise in gene expression

This guest post is by Radu Zabet on his preprint (with Armin Schoech) Facilitated diffusion buffers noise in gene expression, arXived here.

How does the binding dynamics of transcription factors affect the noise in gene expression?

Transcription factors (TFs) are proteins that bind to DNA and control gene activity. Gene regulation can be modelled as a chemical reaction, which is fundamentally a stochastic process. Given the importance of an accurate control of the gene regulatory program in the cell, significant efforts have been made in understanding the noise properties of gene expression.

Why can noise in gene expression be modelled assuming an ON/OFF gene model?

With few exceptions, previous studies investigated the noise in gene expression assuming that the regulatory process is a two-state Markov model (genes switch stochastically between ON and OFF states). However, it is known that, mechanistically, transcription factors find their genomic target sites through facilitated diffusion, a combination of 3D diffusion in the cytoplasm/nucleoplasm and 1D random walk along the DNA, and this is likely to influence the noise properties of the gene regulation process. Previous experimental studies (e.g. see http://www.nature.com/ng/journal/v43/n6/full/ng.821.html) successfully modelled the noise measured experimentally by assuming an ON/OFF gene model (two-state Markov model) in bacterial and animal cells. In this manuscript, we built a three-state Markov model that accurately models the facilitated diffusion and we showed that for biologically relevant parameters, at least in bacteria (we assumed lac repressor system http://www.sciencemag.org/content/336/6088/1595), noise in gene expression can be modelled assuming the ON/OFF gene model, but only if the binding/unbinding rates are adjusted accordingly. This explains why in many cases the experimental noise in gene regulation can be modelled assuming an ON/OFF gene model. Note that there are several exceptions where the noise in gene expression does not seem to be accounted by the ON/OFF gene model (e.g. http://genome.cshlp.org/content/early/2014/07/16/gr.168773.113 or http://www.pnas.org/content/111/29/10598).

What is the effect of facilitated diffusion on the noise in gene expression?

Next, assuming the ON/OFF gene model we investigated the evolutionary advantage that a TF, which performs facilitated diffusion, has on noise in gene expression compared to an equivalent TF that only performs the 3D diffusion (and does not perform 1D random walk on the DNA). Our results show that the noise in gene expression can be reduced significantly when the TF performs facilitated diffusion compared to its equivalent TF that only performs 3D diffusion in the cell. This is important, because while the majority of the studies identify the speedup in the binding site search process as the main evolutionary advantage of why facilitated diffusion exists, we show that, in addition to this speedup in binding kinetics, facilitated diffusion also reduces the noise in gene expression. Interestingly, it seems that the noise level in gene expression is reduced close to the noise level of an unregulated gene (the lowest noise level in gene expression that could be achieved), while the noise of an equivalent TF that performs only 3D diffusion is significantly higher.

Finally, to test our model, we parameterise it with values measured experimentally in the case of lac repressor in E. coli and we estimated the mean mRNA level to be 0.16 and the Fano factor (variance divided by mean) to be 1.3 (as opposed to 2.0 in the case of TF performing only 3D diffusion). These values are similar to the values measured experimentally in the low inducer case of Plac by http://www.nature.com/ng/journal/v43/n6/full/ng.821.html (mean mRNA level of 0.15 and Fano factor of 1.25) and shows that facilitated diffusion is essential in explaining the experimentally measured noise in mRNA.

Author post: Sharing of Very Short IBD Segments between Humans, Neandertals, and Denisovans

This guest post is by Gundula Povysil and Sepp Hochreiter on their preprint Sharing of Very Short IBD Segments between Humans, Neandertals, and Denisovans, bioRxived here.

We completed our preprint Sharing of Very Short IBD Segments between Humans, Neandertals, and Denisovans in bioRxiv by presenting results not only for chromosome 1 but now for all autosomes and chromosome X.

In this manuscript we analyze the sharing of very short identity by descent (IBD) segments between humans, Neandertals, and Denisovans to gain new insights into their demographic history. In the updated version we included a separate chromosome X analysis (both IBD segment sharing and length of segments). We identified IBD segments in the 1000 Genomes Project sequencing data using our recently published method HapFABIA, many of which are shared with Neandertals or Denisovans.

Here we highlight the most interesting findings of our analysis:

Introgression from Denisovans into ancestors of Asians:

The Denisova genome most prominently matches IBD segments that are shared by Asians and on average these segments are longer than segments shared between other continental populations and the Denisova genome. Therefore, we could confirm an introgression from Denisovans into ancestors of Asians after their migration out of Africa.

Introgression from Neandertals into ancestors of Europeans and Asians:

While Neandertal-matching IBD segments are most often shared by Asians, Europeans share a considerably higher percentage of IBD segments with Neandertals compared to other populations, too. Neandertal-matching IBD segments that are shared by Asians or Europeans are longer than those observed in Africans. These IBD segments hint at a gene flow from Neandertals into ancestors of Asians and Europeans after they left Africa.

Ancient Neandertal and Denisova IBD segments survived only in Africans

Interestingly, many Neandertal- and/or Denisova-matching IBD segments are predominantly observed in Africans – some of them even exclusively. IBD segments shared between Africans and Neandertals or Denisovans are strikingly short, therefore we assume that they are very old. Consequently, we conclude that DNA regions from ancestors of humans, Neandertals, and Denisovans have survived in Africans.

Neandertal but no Denisova introgression on the X chromosome

Neandertal-matching IBD segments on chromosome X confirm gene flow from Neandertals into ancestors of Asians and Europeans outside Africa. Interestingly, there is hardly any signal of Denisova introgression on the X chromosome.

We highly appreciate any comments, discussions, or thoughts on our results.

Butter: High-precision genomic alignment of small RNA-seq data

Butter: High-precision genomic alignment of small RNA-seq data
Michael J Axtell

Eukaryotes produce large numbers of small non-coding RNAs that act as specificity determinants for various gene-regulatory complexes. These include microRNAs (miRNAs), endogenous short interfering RNAs (siRNAs), and Piwi-associated RNAs (piRNAs). These RNAs can be discovered, annotated, and quantified using small RNA-seq, a variant RNA-seq method based on highly parallel sequencing. Alignment to a reference genome is a critical step in analysis of small RNA-seq data. Because of their small size (20-30 nts depending on the organism and sub-type) and tendency to originate from multi-gene families or repetitive regions, reads that align equally well to more than one genomic location are very common. Typical methods to deal with multi-mapped small RNA-seq reads sacrifice either precision or sensitivity. The tool ‘butter’ balances precision and sensitivity by placing multi-mapped reads using an iterative approach, where the decision between possible locations is dictated by the local densities of more confidently aligned reads. Butter displays superior performance relative to other small RNA-seq aligners. Treatment of multi-mapped small RNA-seq reads has substantial impacts on downstream analyses, including quantification of MIRNA paralogs, and discovery of endogenous siRNA loci. Butter is freely available under a GNU general public license.

Facilitated diffusion buffers noise in gene expression

Facilitated diffusion buffers noise in gene expression

Armin Schoech, Nicolae Radu Zabet
(Submitted on 22 Jul 2014)

Transcription factors perform facilitated diffusion (3D diffusion in the cytosol and 1D diffusion on the DNA) when binding to their target sites to regulate gene expression. Here, we investigated the influence of this binding mechanism on the noise in gene expression. Our results showed that, for biologically relevant parameters, the binding process can be represented by a two-state Markov model and that the accelerated target finding due to facilitated diffusion leads to a reduction in both the mRNA and the protein noise.

Clonal interference and Muller’s ratchet in spatial habitats

Clonal interference and Muller’s ratchet in spatial habitats
Jakub Otwinowski, Joachim Krug
(Submitted on 18 Feb 2013 (v1), last revised 23 Jul 2014 (this version, v3))

Competition between independently arising beneficial mutations is enhanced in spatial populations due to the linear rather than exponential growth of clones. Recent theoretical studies have pointed out that the resulting fitness dynamics is analogous to a surface growth process, where new layers nucleate and spread stochastically, leading to the build up of scale-invariant roughness. This scenario differs qualitatively from the standard view of adaptation in that the speed of adaptation becomes independent of population size while the fitness variance does not. Here we exploit recent progress in the understanding of surface growth processes to obtain precise predictions for the universal, non-Gaussian shape of the fitness distribution for one-dimensional habitats, which are verified by simulations. When the mutations are deleterious rather than beneficial the problem becomes a spatial version of Muller’s ratchet. In contrast to the case of well-mixed populations, the rate of fitness decline remains finite even in the limit of an infinite habitat, provided the ratio Ud/s2 between the deleterious mutation rate and the square of the (negative) selection coefficient is sufficiently large. Using again an analogy to surface growth models we show that the transition between the stationary and the moving state of the ratchet is governed by directed percolation.

Statistical and conceptual challenges in the comparative analysis of principal components

Statistical and conceptual challenges in the comparative analysis of principal components

Josef C Uyeda, Daniel S. Caetano, Matthew W Pennell

Quantitative geneticists long ago recognized the value of studying evolution in a multivariate framework (Pearson, 1903). Due to linkage, pleiotropy, coordinated selection and mutational covariance, the evolutionary response in any phenotypic trait can only be properly understood in the context of other traits (Lande, 1979; Lynch and Walsh, 1998). This is of course also well?appreciated by comparative biologists. However, unlike in quantitative genetics, most of the statistical and conceptual tools for analyzing phylogenetic comparative data (recently reviewed in Pennell and Harmon, 2013) are designed for analyzing a single trait (but see, for example Revell and Harmon, 2008; Revell and Harrison, 2008; Hohenlohe and Arnold, 2008; Revell and Collar, 2009; Schmitz and Motani, 2011; Adams, 2014b). Indeed, even classical approaches for testing for correlated evolution between two traits (e.g., Felsenstein, 1985; Grafen, 1989; Harvey and Pagel, 1991) are not actually multivariate as each trait is assumed to have evolved under a process that is independent of the state of the other (Hansen and Orzack, 2005; Hansen and Bartoszek, 2012). As a result of these limitations, researchers with multivariate datasets are often faced with a choice: analyze each trait as if they were independent or else decompose the dataset into statistically independent set of traits, such that each set can be analyzed with the univariate methods.

Concerning RNA-Guided Gene Drives for the Alteration of Wild Populations

Concerning RNA-Guided Gene Drives for the Alteration of Wild Populations
Kevin M Esvelt, Andrea L Smidler, Flaminia Catteruccia, George M Church

Gene drives may be capable of addressing ecological problems by altering entire populations of wild organisms, but their use has remained largely theoretical due to technical constraints. Here we consider the potential for RNA-guided gene drives based on the CRISPR nuclease Cas9 to serve as a general method for spreading altered traits through wild populations over many generations. We detail likely capabilities, discuss limitations, and provide novel precautionary strategies to control the spread of gene drives and reverse genomic changes. The ability to edit populations of sexual species would offer substantial benefits to humanity and the environment. For example, RNA-guided gene drives could potentially prevent the spread of disease, support agriculture by reversing pesticide and herbicide resistance in insects and weeds, and control damaging invasive species. However, the possibility of unwanted ecological effects and near-certainty of spread across political borders demand careful assessment of each potential application. We call for thoughtful, inclusive, and well-informed public discussions to explore the responsible use of this currently theoretical technology.

Assessing allele specific expression across multiple tissues from RNA-seq read data

Assessing allele specific expression across multiple tissues from RNA-seq read data
Matti Pirinen, Tuuli Lappalainen, Noah A Zaitlen, GTEx Consortium, Emmanouil T Dermitzakis, Peter Donnelly, Mark I McCarthy, Manuel A Rivas

Motivation: RNA sequencing enables allele specific expression (ASE) studies that complement standard genotype expression studies for common variants and, importantly, also allow measuring the regulatory impact of rare variants. The Genotype-Tissue Expression project (GTEx) is collecting RNA-seq data on multiple tissues of a same set of individuals and novel methods are required for the analysis of these data. Results: We present a statistical method to compare different patterns of ASE across tissues and to classify genetic variants according to their impact on the tissue-wide expression profile. We focus on strong ASE effects that we are expecting to see for protein-truncating variants, but our method can also be adjusted for other types of ASE effects. We illustrate the method with a real data example on a tissue-wide expression profile of a variant causal for lipoid proteinosis, and with a simulation study to assess our method more generally. Availability: MAMBA software: http://birch.well.ox.ac.uk/~rivas/mamba/ R source code and data examples: http://www.iki.fi/mpirinen/ Contact: matti.pirinen@helsinki.fi rivas@well.ox.ac.uk