Carlos P. Roca, Susana I. L. Gomes, Mónica J. B. Amorim, Janeck J. Scott-Fordsmand
RNA-Seq and gene expression microarrays provide comprehensive profiles of gene activity, by measuring the concentration of tens of thousands of mRNA molecules in single assays. However, lack of accuracy and reproducibility have hindered the application of these high-throughput technologies. A key challenge in the data analysis is the normalization of gene expression levels, which is required to make them comparable between samples. This normalization is currently performed following approaches resting on an implicit assumption that most genes are not differentially expressed. Here we show that this assumption is unrealistic and likely results in failure to detect numerous gene expression changes. We have devised a mathematical approach to normalization that makes no assumption of this sort. We have found that variation in gene expression is much greater than currently believed, and that it can be measured with available technologies. Our results also explain, at least partially, the problems encountered in transcriptomics studies. We expect this improvement in detection to help efforts to realize the full potential of gene expression profiling, especially in analyses of cellular processes involving complex modulations of gene expression, such as cell differentiation, toxic responses and cancer.