Transposable elements contribute to activation of maize genes in response to abiotic stress
Irina Makarevitch, Amanda J Waters, Patrick T West, Michelle C Stitzer, Jeffrey Ross-Ibarra, Nathan M Springer
Transposable elements (TEs) account for a large portion of the genome in many eukaryotic species. Despite their reputation as “junk” DNA or genomic parasites deleterious for the host, TEs have complex interactions with host genes and the potential to contribute to regulatory variation in gene expression. It has been hypothesized that TEs and genes they insert near may be transcriptionally activated in response to stress conditions. The maize genome, with many different types of TEs interspersed with genes, provides an ideal system to study the genome-wide influence of TEs on gene regulation. To analyze the magnitude of the TE effect on gene expression response to environmental changes, we profiled gene and TE transcript levels in maize seedlings exposed to a number of abiotic stresses. Many genes exhibit up- or down-regulation in response to these stress conditions. The analysis of TE families inserted within upstream regions of up-regulated genes revealed that between four and nine different TE families are associated with up-regulated gene expression in each of these stress conditions, affecting up to 20% of the genes up-regulated in response to abiotic stress and as many as 33% of genes that are only expressed in response to stress. Expression of many of these same TE families also responds to the same stress conditions. The analysis of the stress- induced transcripts and proximity of the transposon to the gene suggests that these TEs may provide local enhancer activities that stimulate stress-responsive gene expression. Our data on allelic variation for insertions of several of these TEs show strong correlation between the presence of TE insertions and stress-responsive up-regulation of gene expression. Our findings suggest that TEs provide an important source of allelic regulatory variation in gene response to abiotic stress in maize.
The meta-epigenomic structure of purified human stem cell populations is defined at cis-regulatory sequences
N. Ari Wijetunga, Fabien Delahaye, Yong Mei Zhao, Aaron Golden, Jessica C Mar, Francine H. Einstein, John M. Greally
The mechanism and significance of epigenetic variability in the same cell type between healthy individuals are not clear. Here, we purify human CD34+ hematopoietic stem and progenitor cells (HSPCs) from different individuals and find that there is increased variability of DNA methylation at loci with properties of promoters and enhancers. The variability is especially enriched at candidate enhancers near genes transitioning between silent and expressed states, and encoding proteins with leukocyte differentiation properties. Our findings of increased variability at loci with intermediate DNA methylation values, at candidate “poised” enhancers, and at genes involved in HSPC lineage commitment suggest that CD34+ cell subtype heterogeneity between individuals is a major mechanism for the variability observed. Epigenomic studies performed on cell populations, even when purified, are testing collections of epigenomes, or meta-epigenomes. Our findings show that meta-epigenomic approaches to data analysis can provide insights into cell subpopulation structure.
Facilitated diffusion buffers noise in gene expression
Armin Schoech, Nicolae Radu Zabet
(Submitted on 22 Jul 2014)
Transcription factors perform facilitated diffusion (3D diffusion in the cytosol and 1D diffusion on the DNA) when binding to their target sites to regulate gene expression. Here, we investigated the influence of this binding mechanism on the noise in gene expression. Our results showed that, for biologically relevant parameters, the binding process can be represented by a two-state Markov model and that the accelerated target finding due to facilitated diffusion leads to a reduction in both the mRNA and the protein noise.
Assessing allele specific expression across multiple tissues from RNA-seq read data
Matti Pirinen, Tuuli Lappalainen, Noah A Zaitlen, GTEx Consortium, Emmanouil T Dermitzakis, Peter Donnelly, Mark I McCarthy, Manuel A Rivas
Motivation: RNA sequencing enables allele specific expression (ASE) studies that complement standard genotype expression studies for common variants and, importantly, also allow measuring the regulatory impact of rare variants. The Genotype-Tissue Expression project (GTEx) is collecting RNA-seq data on multiple tissues of a same set of individuals and novel methods are required for the analysis of these data. Results: We present a statistical method to compare different patterns of ASE across tissues and to classify genetic variants according to their impact on the tissue-wide expression profile. We focus on strong ASE effects that we are expecting to see for protein-truncating variants, but our method can also be adjusted for other types of ASE effects. We illustrate the method with a real data example on a tissue-wide expression profile of a variant causal for lipoid proteinosis, and with a simulation study to assess our method more generally. Availability: MAMBA software: http://birch.well.ox.ac.uk/~rivas/mamba/ R source code and data examples: http://www.iki.fi/mpirinen/ Contact: firstname.lastname@example.org email@example.com
RNA-seq gene profiling – a systematic empirical comparison
Nuno A Fonseca, John A Marioni, Alvis Brazma
Accurately quantifying gene expression levels is a key goal of experiments using RNA-sequencing to assay the transcriptome. This typically requires aligning the short reads generated to the genome or transcriptome before quantifying expression of pre-defined sets of genes. Differences in the alignment/quantification tools can have a major effect upon the expression levels found with important consequences for biological interpretation. Here we address two main issues: do different analysis pipelines affect the gene expression levels inferred from RNA-seq data? And, how close are the expression levels inferred to the “true” expression levels? We evaluate fifty gene profiling pipelines in experimental and simulated data sets with different characteristics (e.g, read length and sequencing depth). In the absence of knowledge of the ‘ground truth’ in real RNAseq data sets, we used simulated data to assess the differences between the true expression and those reconstructed by the analysis pipelines. Even though this approach does not take into account all known biases present in RNAseq data, it still allows to assess the accuracy of the gene expression values inferred by different analysis pipelines. The results show that i) overall there is a high correlation between the expression levels inferred by the best pipelines and the true quantification values; ii) the error in the estimated gene expression values can vary considerably across genes; and iii) a small set of genes have expression estimates with consistently high error (across data sets and methods). Finally, although the mapping software is important, the quantification method makes a greater difference to the results.
THE GENETIC LANDSCAPE OF TRANSCRIPTIONAL NETWORKS IN A COMBINED HAPLOID/DIPLOID PLANT SYSTEM
Jukka-Pekka Verta, Christian R Landry, John J MacKay
Heritable variation in gene expression is a source of evolutionary change and our understanding of the genetic basis of expression variation remains incomplete. Here, we dissected the genetic basis of transcriptional variation in a wild, outbreeding gymnosperm (Picea glauca) according to linked and unlinked genetic variants, their allele-specific (cis) and allele non-specific (trans) effects, and their phenotypic additivity. We used a novel plant system that is based on the analysis of segregating alleles of a single self-fertilized plant in haploid and diploid seed tissues. We measured transcript abundance and identified transcribed SNPs in 66 seeds with RNA-seq. Linked and unlinked genetic effects that influenced expression levels were abundant in the haploid megagametophyte tissue, influencing 48% and 38% of analyzed genes, respectively. Analysis of these effects in diploid embryos revealed that while distant effects were acting in trans consistent with their hypothesized diffusible nature, local effects were associated with a complex mix of cis, trans and compensatory effects. Most cis effects were additive irrespective of their effect sizes, consistent with a hypothesis that they represent rate-limiting factors in transcript accumulation. We show that trans effects fulfilled a key prediction of Wright?s physiological theory, in which variants with small effects tend to be additive and those with large effects tend to be dominant/recessive. Our haploid/diploid approach allows a comprehensive genetic dissection of expression variation and can be applied to a large number of wild plant species.
Benchmark Analysis of Algorithms for Determining and Quantifying Full-length mRNA Splice Forms from RNA-Seq Data
Katharina Hayer, Angel Pizzaro, Nicholas L Lahens, John B Hogenesch, Gregory R Grant
The advantages of RNA sequencing (RNA-Seq) suggest it will replace microarrays for highly parallel gene expression analysis. For example, in contrast to arrays, RNA-Seq is expected to be able to provide accurate identification and quantification of full-length transcripts. A number of methods have been developed for this purpose, but short error prone reads makes it a difficult problem in practice. It is essential to determine which algorithms perform best, and where and why they fail. However, there is a dearth of independent and unbiased benchmarking studies of these algorithms. Here we take an approach using both simulated and experimental benchmark data to evaluate their accuracy. We conclude that most methods are inaccurate even using idealized data, and that no is method sufficiently accurate once complicating factors such as polymorphisms, intron signal, sequencing error, and multiple splice forms are present. These results point to the pressing need for further algorithm development.
Pervasive variation of transcription factor orthologs contributes to regulatory network evolution
Shilpa Nadimpalli, Anton V. Persikov, Mona Singh
Comments: 29 pages, 5 figures, 5 supplemental figures, 3 supplemental tables
Subjects: Genomics (q-bio.GN)
Differences in transcriptional regulatory networks underlie much of the phenotypic variation observed across organisms. Changes to cis-regulatory elements are widely believed to be the predominant means by which regulatory networks evolve, yet examples of regulatory network divergence due to transcription factor (TF) variation have also been observed. To systematically ascertain the extent to which TFs contribute to regulatory divergence, we analyzed the evolution of the largest class of metazoan TFs, Cys2-His2 zinc finger (C2H2-ZF) TFs, across 12 Drosophila species spanning ~45 million years of evolution. Remarkably, we uncovered that a significant fraction of all C2H2-ZF 1-to-1 orthologs in flies exhibit variations that can affect their DNA-binding specificities. In addition to loss and recruitment of C2H2-ZF domains, we found diverging DNA-contacting residues in ~47% of domains shared between D. melanogaster and the other fly species. These diverging DNA-contacting residues, found in ~66% of the D. melanogaster C2H2-ZF genes in our analysis and corresponding to ~24% of all annotated D. melanogaster TFs, show evidence of functional constraint: they tend to be conserved across phylogenetic clades and evolve slower than other diverging residues. These same variations were rarely found as polymorphisms within a population of D. melanogaster flies, indicating their rapid fixation. The predicted specificities of these dynamic domains gradually change across phylogenetic distances, suggesting stepwise evolutionary trajectories for TF divergence. Further, whereas proteins with conserved C2H2-ZF domains are enriched in developmental functions, those with varying domains exhibit no functional enrichments. Our work suggests that a subset of highly dynamic and largely unstudied TFs are a likely source of regulatory variation in Drosophila and other metazoans.
Genome-wide Identification of Zero Nucleotide Recursive Splicing in Drosophila
Michael O Duff, Sara Olson, Xintao Wei, Ahmad Osman, Alex Plocik, Mohan Bolisetty, Susan Celniker, Brenton Graveley
Recursive splicing is a process in which large introns are removed in multiple steps by resplicing at ratchet points – 5? splice sites recreated after splicing. Recursive splicing was first identified in the Drosophila Ultrabithorax (Ubx) gene and only three additional Drosophila genes have since been experimentally shown to undergo recursive splicing. Here, we identify 196 zero nucleotide exon ratchet points in 130 introns of 115 Drosophila genes from total RNA sequencing data generated from developmental time points, dissected tissues, and cultured cells. Recursive splicing events were identified by splice junctions that map to annotated 5? splice sites and unannotated intronic 3? splice sites, the presence of the sequence AG/GT at the 3? splice site, and a 5? to 3? gradient of decreasing RNA-Seq read density indicative of co-transcriptional splicing. The sequential nature of recursive splicing was confirmed by identification of lariat introns generated by splicing to and from the ratchet points. We also show that recursive splicing is a constitutive process, and that the sequence and function of ratchet points are evolutionarily conserved. Together these results indicate that recursive splicing is commonly used in Drosophila and provides insight into the mechanisms by which some introns are removed.
Methylation QTLs are associated with coordinated changes in transcription factor binding, histone modifications, and gene expression levels.
Nicholas E Banovich, Xun Lan, Graham McVicker, Bryce Van de Geijn, Jacob F Degner, John D. Blischak, Jonathan K. Pritchard, Yoav Gilad
DNA methylation is an important epigenetic regulator of gene expression. Recent studies have revealed widespread associations between genetic variation and methylation levels. However, the mechanistic links between genetic variation and methylation remain unclear. To begin addressing this gap, we collected methylation data at ~300,000 loci in lymphoblastoid cell lines (LCLs) from 64 HapMap Yoruba individuals, and genome-wide bisulfite sequence data in ten of these individuals. We identified (at an FDR of 10%) 11,752 methylation QTLs (meQTLs)?i.e., loci in which genetic variation is associated with changes in DNA methylation. We found that meQTLs are frequently associated with changes in methylation at multiple CpGs across regions of up to 3 kb. Interestingly, meQTLs are also frequently associated with variation in other properties of gene regulation, including histone modifications, DNase I accessibility, chromatin accessibility, and expression levels of nearby genes. These observations suggest that genetic variants may lead to coordinated molecular changes in all of these regulatory phenotypes. One plausible driver of coordinated changes in different regulatory mechanisms is variation in transcription factor (TF) binding. Indeed, we found that SNPs that change predicted TF binding affinities are significantly enriched for associations with DNA methylation at nearby CpGs. Taken together, our observations are consistent with a model whereby changes in TF binding may frequently drive coordinated changes in DNA methylation, histone modification, and gene expression levels.