The Genetic Architecture of Gene Expression Levels in Wild Baboons

The Genetic Architecture of Gene Expression Levels in Wild Baboons

Jenny Tung, Xiang Zhou, Susan C Alberts, Matthew Stephens, Yoav Gilad

Gene expression variation is well documented in human populations and its genetic architecture has been extensively explored. However, we still know little about the genetic architecture of gene expression variation in other species, particularly our closest living relatives, the nonhuman primates. To address this gap, we performed an RNA sequencing (RNA-seq)-based study of 63 wild baboons, members of the intensively studied Amboseli baboon population in Kenya. Our study design allowed us to measure gene expression levels and identify genetic variants using the same data set, enabling us to perform complementary mapping of putative cis-acting expression quantitative trait loci (eQTL) and measurements of allele-specific expression (ASE) levels. We discovered substantial evidence for genetic effects on gene expression levels in this population. Surprisingly, we found more power to detect individual eQTL in the baboons relative to a HapMap human data set of comparable size, probably as a result of greater genetic variation, enrichment of SNPs with high minor allele frequencies, and longer-range linkage disequilibrium in the baboons. eQTL were most likely to be identified for lineage-specific, rapidly evolving genes. Interestingly, genes with eQTL significantly overlapped between the baboon and human data sets, suggesting that some genes may tolerate more genetic perturbation than others, and that this property may be conserved across species. Finally, we used a Bayesian sparse linear mixed model to partition genetic, demographic, and early environmental contributions to variation in gene expression levels. We found a strong genetic contribution to gene expression levels for almost all genes, while individual demographic and environmental effects tended to be more modest. Together, our results establish the feasibility of eQTL mapping using RNA-seq data alone, and act as an important first step towards understanding the genetic architecture of gene expression variation in nonhuman primates.

Transposable elements contribute to activation of maize genes in response to abiotic stress

Transposable elements contribute to activation of maize genes in response to abiotic stress

Irina Makarevitch, Amanda J Waters, Patrick T West, Michelle C Stitzer, Jeffrey Ross-Ibarra, Nathan M Springer

Transposable elements (TEs) account for a large portion of the genome in many eukaryotic species. Despite their reputation as “junk” DNA or genomic parasites deleterious for the host, TEs have complex interactions with host genes and the potential to contribute to regulatory variation in gene expression. It has been hypothesized that TEs and genes they insert near may be transcriptionally activated in response to stress conditions. The maize genome, with many different types of TEs interspersed with genes, provides an ideal system to study the genome-wide influence of TEs on gene regulation. To analyze the magnitude of the TE effect on gene expression response to environmental changes, we profiled gene and TE transcript levels in maize seedlings exposed to a number of abiotic stresses. Many genes exhibit up- or down-regulation in response to these stress conditions. The analysis of TE families inserted within upstream regions of up-regulated genes revealed that between four and nine different TE families are associated with up-regulated gene expression in each of these stress conditions, affecting up to 20% of the genes up-regulated in response to abiotic stress and as many as 33% of genes that are only expressed in response to stress. Expression of many of these same TE families also responds to the same stress conditions. The analysis of the stress- induced transcripts and proximity of the transposon to the gene suggests that these TEs may provide local enhancer activities that stimulate stress-responsive gene expression. Our data on allelic variation for insertions of several of these TEs show strong correlation between the presence of TE insertions and stress-responsive up-regulation of gene expression. Our findings suggest that TEs provide an important source of allelic regulatory variation in gene response to abiotic stress in maize.

The meta-epigenomic structure of purified human stem cell populations is defined at cis-regulatory sequences

The meta-epigenomic structure of purified human stem cell populations is defined at cis-regulatory sequences

N. Ari Wijetunga, Fabien Delahaye, Yong Mei Zhao, Aaron Golden, Jessica C Mar, Francine H. Einstein, John M. Greally

The mechanism and significance of epigenetic variability in the same cell type between healthy individuals are not clear. Here, we purify human CD34+ hematopoietic stem and progenitor cells (HSPCs) from different individuals and find that there is increased variability of DNA methylation at loci with properties of promoters and enhancers. The variability is especially enriched at candidate enhancers near genes transitioning between silent and expressed states, and encoding proteins with leukocyte differentiation properties. Our findings of increased variability at loci with intermediate DNA methylation values, at candidate “poised” enhancers, and at genes involved in HSPC lineage commitment suggest that CD34+ cell subtype heterogeneity between individuals is a major mechanism for the variability observed. Epigenomic studies performed on cell populations, even when purified, are testing collections of epigenomes, or meta-epigenomes. Our findings show that meta-epigenomic approaches to data analysis can provide insights into cell subpopulation structure.

Facilitated diffusion buffers noise in gene expression

Facilitated diffusion buffers noise in gene expression

Armin Schoech, Nicolae Radu Zabet
(Submitted on 22 Jul 2014)

Transcription factors perform facilitated diffusion (3D diffusion in the cytosol and 1D diffusion on the DNA) when binding to their target sites to regulate gene expression. Here, we investigated the influence of this binding mechanism on the noise in gene expression. Our results showed that, for biologically relevant parameters, the binding process can be represented by a two-state Markov model and that the accelerated target finding due to facilitated diffusion leads to a reduction in both the mRNA and the protein noise.

Assessing allele specific expression across multiple tissues from RNA-seq read data

Assessing allele specific expression across multiple tissues from RNA-seq read data
Matti Pirinen, Tuuli Lappalainen, Noah A Zaitlen, GTEx Consortium, Emmanouil T Dermitzakis, Peter Donnelly, Mark I McCarthy, Manuel A Rivas

Motivation: RNA sequencing enables allele specific expression (ASE) studies that complement standard genotype expression studies for common variants and, importantly, also allow measuring the regulatory impact of rare variants. The Genotype-Tissue Expression project (GTEx) is collecting RNA-seq data on multiple tissues of a same set of individuals and novel methods are required for the analysis of these data. Results: We present a statistical method to compare different patterns of ASE across tissues and to classify genetic variants according to their impact on the tissue-wide expression profile. We focus on strong ASE effects that we are expecting to see for protein-truncating variants, but our method can also be adjusted for other types of ASE effects. We illustrate the method with a real data example on a tissue-wide expression profile of a variant causal for lipoid proteinosis, and with a simulation study to assess our method more generally. Availability: MAMBA software: R source code and data examples: Contact:

RNA-seq gene profiling – a systematic empirical comparison

RNA-seq gene profiling – a systematic empirical comparison

Nuno A Fonseca, John A Marioni, Alvis Brazma

Accurately quantifying gene expression levels is a key goal of experiments using RNA-sequencing to assay the transcriptome. This typically requires aligning the short reads generated to the genome or transcriptome before quantifying expression of pre-defined sets of genes. Differences in the alignment/quantification tools can have a major effect upon the expression levels found with important consequences for biological interpretation. Here we address two main issues: do different analysis pipelines affect the gene expression levels inferred from RNA-seq data? And, how close are the expression levels inferred to the “true” expression levels? We evaluate fifty gene profiling pipelines in experimental and simulated data sets with different characteristics (e.g, read length and sequencing depth). In the absence of knowledge of the ‘ground truth’ in real RNAseq data sets, we used simulated data to assess the differences between the true expression and those reconstructed by the analysis pipelines. Even though this approach does not take into account all known biases present in RNAseq data, it still allows to assess the accuracy of the gene expression values inferred by different analysis pipelines. The results show that i) overall there is a high correlation between the expression levels inferred by the best pipelines and the true quantification values; ii) the error in the estimated gene expression values can vary considerably across genes; and iii) a small set of genes have expression estimates with consistently high error (across data sets and methods). Finally, although the mapping software is important, the quantification method makes a greater difference to the results.


Jukka-Pekka Verta, Christian R Landry, John J MacKay

Heritable variation in gene expression is a source of evolutionary change and our understanding of the genetic basis of expression variation remains incomplete. Here, we dissected the genetic basis of transcriptional variation in a wild, outbreeding gymnosperm (Picea glauca) according to linked and unlinked genetic variants, their allele-specific (cis) and allele non-specific (trans) effects, and their phenotypic additivity. We used a novel plant system that is based on the analysis of segregating alleles of a single self-fertilized plant in haploid and diploid seed tissues. We measured transcript abundance and identified transcribed SNPs in 66 seeds with RNA-seq. Linked and unlinked genetic effects that influenced expression levels were abundant in the haploid megagametophyte tissue, influencing 48% and 38% of analyzed genes, respectively. Analysis of these effects in diploid embryos revealed that while distant effects were acting in trans consistent with their hypothesized diffusible nature, local effects were associated with a complex mix of cis, trans and compensatory effects. Most cis effects were additive irrespective of their effect sizes, consistent with a hypothesis that they represent rate-limiting factors in transcript accumulation. We show that trans effects fulfilled a key prediction of Wright?s physiological theory, in which variants with small effects tend to be additive and those with large effects tend to be dominant/recessive. Our haploid/diploid approach allows a comprehensive genetic dissection of expression variation and can be applied to a large number of wild plant species.