Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels

Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels

Patrick Deelen, Daria Zhernakova, Mark de Haan, Marijke van der Sijde, Marc Jan Bonder, Juha Karjalainen, K. Joeri van der Velde, Kristin M. Abbott, Jingyuan Fu, Cisca Wijmenga, Richard J. Sinke, Morris A. Swertz, Lude Franke
doi: http://dx.doi.org/10.1101/007633

Given increasing numbers of RNA-seq samples in the public domain, we studied to what extent expression quantitative trait loci (eQTLs) and allele-specific expression (ASE) can be identified in public RNA-seq data while also deriving the genotypes from the RNA-seq reads. 4,978 human RNA-seq runs, representing many different tissues and cell-types, passed quality control. Even though this data originated from many different laboratories, samples reflecting the same cell-type clustered together, suggesting that technical biases due to different sequencing protocols were limited. We derived genotypes from the RNA-seq reads and imputed non-coding variants. In a joint analysis on 1,262 samples combined, we identified cis-eQTLs effects for 8,034 unique genes. Additionally, we observed strong ASE effects for 34 rare pathogenic variants, corroborating previously observed effects on the corresponding protein levels. Given the exponential growth of the number of publicly available RNA-seq samples, we expect this approach will become relevant for studying tissue-specific effects of rare pathogenic genetic variants.

1 thought on “Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels

  1. Pingback: Most viewed on Haldane’s Sieve: August 2014 | Haldane's Sieve

Leave a comment