Abundant contribution of short tandem repeats to gene expression variation in humans
Melissa Gymrek , Thomas Willems , Haoyang Zeng , Barak Markus , Mark J Daly , Alkes L Price , Jonathan Pritchard , Yaniv Erlich
Expression quantitative trait loci (eQTLs) are a key tool to dissect cellular processes mediating complex diseases. However, little is known about the role of repetitive elements as eQTLs. We report a genome-wide survey of the contribution of Short Tandem Repeats (STRs), one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from linked SNPs and indels and found that eSTRs contribute 10%-15% of the cis-heritability mediated by all common variants. Functional genomic analyses showed that eSTRs are enriched in conserved regions, co-localize with regulatory elements, and are predicted to modulate histone modifications. Our results show that eSTRs provide a novel set of regulatory variants and highlight the contribution of repeats to the genetic architecture of quantitative human traits.
Entire genome transcription across evolutionary time exposes non-coding DNA to de novo gene emergence
Rafik Neme , Diethard Tautz
Even in the best studied Mammalian genomes, less than 5% of the total genome length is annotated as exonic. However, deep sequencing analysis in humans has shown that around 40% of the genome may be covered by poly-adenylated non-coding transcripts occurring at low levels. Their functional significance is unclear, and there has been a dispute whether they should be considered as noise of the transcriptional machinery. We propose that if such transcripts show some evolutionary stability they will serve as substrates for de novo gene evolution, i.e. gene emergence out of non-coding DNA. Here, we characterize the phylogenetic turnover of low-level poly-adenylated transcripts in a comprehensive sampling of populations, sub-species and species of the genus Mus, spanning a phylogenetic distance of about 10 Myr. We find evidence for more evolutionary stable gains of transcription than losses among closely related taxa, balanced by a loss of older transcripts across the whole phylogeny. We show that adding taxa increases the genomic transcript coverage and that no major transcript-free islands exist over time. This suggests that the entire genome can be transcribed into poly-adenylated RNA when viewed at an evolutionary time scale. Thus, any part of the “non-coding” genome can become subject to evolutionary functionalization via de novo gene evolution.
RNAseq in the mosquito maxillary palp: a little antennal RNA goes a long way
David C. Rinker , Xiaofan Zhou , Ronald Jason Pitts , Antonis Rokas , LJ Zwiebel
A comparative transcriptomic study of mosquito olfactory tissues recently published in BMC Genomics (Hodges et al., 2014) reported several novel findings that have broad implications for the field of insect olfaction. In this brief commentary, we outline why the conclusions of Hodges et al. are problematic under the current models of insect olfaction and then contrast their findings with those of other RNAseq based studies of mosquito olfactory tissues. We also generated a new RNAseq data set from the maxillary palp of Anopheles gambiae in an effort to replicate the novel results of Hodges et al. but were unable to reproduce their results. Instead, our new RNAseq data support the more straightforward explanation that the novel findings of Hodges et al. were a consequence of contamination by antennal RNA. In summary, we find strong evidence to suggest that the conclusions of Hodges et al were spurious, and that at least some of their RNAseq data sets were irrevocably compromised by cross-contamination between samples.
Exploring functional variation affecting ceRNA regulation in humans
Mulin Jun Li , Jiexing Wu , Peng Jiang , Wei Li , Yun Zhu , Daniel Fernandez , Russell J. H. Ryan , Yiwen Chen , Junwen Wang , Jun S. Liu , X. Shirley Liu
MicroRNA (miRNA) sponges have been shown to function as competing endogenous RNAs (ceRNAs) to regulate the expression of other miRNA targets in the network by sequestering available miRNAs. As the first systematic investigation of the genome-wide genetic effect on ceRNA regulation, we applied multivariate response regression and identified widespread genetic variations that are associated with ceRNA competition using 462 Geuvadis RNA-seq data in multiple human populations. We showed that SNPs in gene 3’UTRs at the miRNA seed binding regions can simultaneously regulate gene expression changes in both cis and trans by the ceRNA mechanism. We termed these loci as endogenous miRNA sponge expression quantitative trait loci or “emsQTLs”, and found that a large number of them were unexplored in conventional eQTL mapping. We identified many emsQTLs are undergoing recent positive selection in different human populations. Using GWAS results, we found that emsQTLs are significantly enriched in traits/diseases associated loci. Functional prediction and prioritization extend our understanding on causality of emsQTL allele in disease pathways. We illustrated that emsQTL can synchronously regulate the expression of tumor suppressor and oncogene through ceRNA competition in angiogenesis. Together these results provide a distinct catalog and characterization of functional noncoding regulatory variants that control ceRNA crosstalk.
Transcriptome Differences between Alternative Sex Determining Genotypes in the House Fly, Musca domestica
Richard P Meisel , Jeffrey G Scott , Andrew G Clark
Sex determination evolves rapidly, often because of turnover of the genes at the top of the pathway. The house fly, Musca domestica, has a multifactorial sex determination system, allowing us to identify the selective forces responsible for the evolutionary turnover of sex determination in action. There is a male determining factor, M, on the Y chromosome (YM), which is probably the ancestral state. An M factor on the third chromosome (IIIM) has reached high frequencies in multiple populations across the world, but the evolutionary forces responsible for the invasion of IIIM are not resolved. To test if the IIIM chromosome invaded because of sex-specific selection pressures, we used mRNA sequencing to determine if isogenic males that differ only in the presence of the YM or IIIM chromosome have different gene expression profiles. We find that more genes are differentially expressed between YM and IIIM males in testis than head, and that genes with male-biased expression are most likely to be differentially expressed between YM and IIIM males. This suggests that male phenotypes, especially those related to male fertility, are more likely to be affected by the male-determining chromosome, supporting the hypothesis that sex-specific selection acts on alleles linked to the male-determining locus driving evolutionary turnover in the sex determination pathway. We additionally find that IIIM males have a “masculinized” gene expression profile, suggesting that the IIIM chromosome has accumulated an excess of male- beneficial alleles because of its male-limited transmission.
Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions
Alicia Schep , Jason D Buenrostro , Sarah K Denny , Katja Schwartz , Gavin Sherlock , William J Greenleaf
Transcription factors canonically bind nucleosome-free DNA, making the positioning of nucleosomes within regulatory regions crucial to the regulation of gene expression. We observe a highly structured pattern of DNA fragment lengths and positions generated by the assay of transposase accessible chromatin (ATAC-seq) around nucleosomes in S. cerevisiae, and use this distinctive two-dimensional nucleosomal “fingerprint” as the basis for a new nucleosome-positioning algorithm called NucleoATAC. We show that NucleoATAC can identify the rotational and translational positions of nucleosomes with up to base pair resolution and provide quantitative measures of nucleosome occupancy in S. cerevisiae, S. pombe, and human cells. We demonstrate application of NucleoATAC to a number of outstanding problems in chromatin biology, including analysis of sequence features underlying nucleosome positioning, promoter chromatin architecture across species, identification of transient changes in nucleosome occupancy and positioning during a dynamic cellular response, and integrated analysis of nucleosome occupancy and transcription factor binding.
Sex chromosome dosage compensation in Heliconius butterflies: global yet still incomplete?
James R Walters , Thomas J Hardcastle , Chris Jiggins
The evolution of heterogametic sex chromosome is often – but not always – accompanied by the evolution of dosage compensating mechanisms that mitigate the impact of sex-specific gene dosage on levels of gene expression. One emerging view of this process is that such mechanisms may only evolve in male-heterogametic (XY) species but not in female-heterogametic (ZW) species, which will consequently exhibit “incomplete” sex chromosome dosage compensation. However, some recent results from moths suggest that Lepidoptera (moths and butterflies) may prove to be an exception to this prediction. Here we report an analysis of sex chromosome dosage compensation in Heliconius butterflies, sampling multiple individuals for several different adult tissues (head, abdomen, leg, mouth, and antennae). Methodologically, we introduce a novel application of linear mixed-effects models to assess dosage compensation, offering a unified statistical framework that can estimate effects specific to chromosome, to sex, and their interactions (i.e., a dosage effect). Our results show substantially reduced Z-linked expression relative to autosomes in both sexes, as previously observed in bombycoid moths. This observation is consistent with an increasing body of evidence that at least some species of moths and butterflies possess an epigenetic sex chromosome dosage compensating mechanism that operates by reducing Z chromosome expression in males. However, this mechanism appears to be imperfect in Heliconius, resulting in a modest dosage effect that produces an average 5-20% male-bias on the Z chromosome, depending on the tissue. Strong sex chromosome dosage effects have been previously in a pyralid moth. Thus our results reflect a mixture of previous patterns reported for Lepidoptera and bisect the emerging view that female-heterogametic ZW taxa have incomplete dosage compensation because they lack a chromosome-wide epigenetic mechanism mediating sex chromosome dosage compensation. In the case of Heliconius, sex chromosome dosage effects persist apparently despite such a mechanism. We also analyze chromosomal distributions of sex-biased genes and show an excess of male-biased and a dearth of female-biased genes on the Z chromosome relative to autosomes, consistent with predictions of sexually antagonistic evolution.
Tools and best practices for allelic expression analysis
Stephane E Castel , Ami Levy-Moonshine , Pejman Mohammadi , Eric Banks , Tuuli Lappalainen
Allelic expression (AE) analysis has become an important tool for integrating genome and transcriptome data to characterize various biological phenomena such as cis-regulatory variation and nonsense-mediated decay. In this paper, we systematically analyze the properties of AE read count data and technical sources of error, such as low-quality or double-counted RNA-seq reads, genotyping errors, allelic mapping bias, and technical covariates due to sample preparation and sequencing, and variation in total read depth. We provide guidelines for correcting and filtering for such errors, and show that the resulting AE data has extremely low technical noise. Finally, we introduce novel software for high-throughput production of AE data from RNA-sequencing data, implemented in the GATK framework. These improved tools and best practices for AE analysis yield higher quality AE data by reducing technical bias. This provides a practical framework for wider adoption of AE analysis by the genomics community.
Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads
Hung-I Harry Chen , Yuanhang Liu , Yi Zou , Zhao Lai , Devanand Sarkar , Yufei Huang , Yidong Chen
Background RNA sequencing (RNA-seq) is a powerful tool for genome-wide expression profiling of biological samples with the advantage of high-throughput and high resolution. There are many existing algorithms nowadays for quantifying expression levels and detecting differential gene expression, but none of them takes the misaligned reads that are mapped to non-exonic regions into account. We developed a novel algorithm, XBSeq, where a statistical model was established based on the assumption that observed signals are the convolution of true expression signals and sequencing noises. The mapped reads in non-exonic regions are considered as sequencing noises, which follows a Poisson distribution. Given measureable observed and noise signals from RNA-seq data, true expression signals, assuming governed by the negative binomial distribution, can be delineated and thus the accurate detection of differential expressed genes. Results We implemented our novel XBSeq algorithm and evaluated it by using a set of simulated expression datasets under different conditions, using a combination of negative binomial and Poisson distributions with parameters derived from real RNA-seq data. We compared the performance of our method with other commonly used differential expression analysis algorithms. We also evaluated the changes in true and false positive rates with variations in biological replicates, differential fold changes, and expression levels in non-exonic regions. We also tested the algorithm on a set of real RNA-seq data where the common and different detection results from different algorithms were reported. Conclusions In this paper, we proposed a novel XBSeq, a differential expression analysis algorithm for RNA-seq data that takes non-exonic mapped reads into consideration. When background noise is at baseline level, the performance of XBSeq and DESeq are mostly equivalent. However, our method surpasses DESeq and other algorithms with the increase of non-exonic mapped reads. Only in very low read count condition XBSeq had a slightly higher false discovery rate, which may be improved by adjusting the background noise effect in this situation. Taken together, by considering non-exonic mapped reads, XBSeq can provide accurate expression measurement and thus detect differential expressed genes even in noisy conditions.
Pathway based factor analysis of gene expression data produces highly heritable phenotypes that associate with age
Andrew Anand Brown , Zhihao Ding , Ana Viñuela , Dan Glass , Leopold Parts , Timothy Spector , John Winn , Richard Durbin
Statistical factor analysis methods have previously been used to remove noise components from high dimensional data prior to genetic association mapping, and in a guided fashion to summarise biologically relevant sources of variation. Here we show how the derived factors summarising pathway expression can be used to analyse the relationships between expression, heritability and ageing. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarise patterns of gene expression, both to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 “pathway phenotypes” which summarised patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold (P<5.38E-5). These phenotypes are more heritable (h^2=0.32) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolising sugars and fatty acids, others with insulin signalling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors.