Patrick O’Connor , Dmitry Andreev , Pavel Baranov
Ribosome profiling is a promising technology for exploring gene expression. However, ribosome profiling data are characterized by a substantial number of outliers due to technical and biological factors. Here we introduce a simple computational method, Ribo-seq Unit Step Transformation (RUST) for the characterization of ribosome profiling data. We show that RUST is robust and outperforms conventional normalization techniques in the presence of sporadic noise. We used RUST to analyse 28 publicly available ribosome profiling datasets obtained from mammalian cells and tissues and from yeast. This revealed substantial protocol dependent variation in the composition of footprint libraries. We selected a high quality dataset to explore the mRNA features that affect local decoding rates and found that the amino acid identity encoded by the codon in the A-site is the major contributing factor followed by the identity of the codon itself and then the amino acid in the P-site. We also found that bulky amino acids slow down ribosome movement when they occur within the peptide tunnel and Proline residues may decrease or increase ribosome velocities depending on the context in which they occur. Moreover we show that a few parameters obtained with RUST are sufficient for predicting experimental densities with high accuracy. Due to its robustness and low computational demand, RUST could be used for quick routine characterization of ribosome profiling datasets to assess their quality as well as for the analysis of the relative impact of mRNA sequence features on local decoding rates.