Estimating transcription factor abundance and specificity from genome-wide binding profiles
Nicolae Radu Zabet, Boris Adryan
Comments: 39 pages, 25 figures, 10 tables
Subjects: Quantitative Methods (q-bio.QM)
The binding of transcription factors (TFs) is essential for gene expression. One important characteristic is the actual occupancy of a putative binding site in the genome. In this study, we propose an analytical model to predict genomic occupancy that incorporates the preferred target sequence of a TF in the form of a position weight matrix (PWM), DNA accessibility data (in case of eukaryotes), the number of TF molecules expected to be bound to the DNA and a parameter that modulates the specificity of the TF. Given actual occupancy data in form of ChIP-seq profiles, we backwards inferred copy number and specificity for five Drosophila TFs during early embryonic development: Bicoid, Caudal, Giant, Hunchback and Kruppel. Our results suggest that these TFs display a lower number of DNA-bound molecules than previously assumed (in the range of tens and hundreds) and that, while Bicoid and Caudal display a higher specificity, the other three transcription factors (Giant, Hunchback and Kruppel) display lower specificity in their binding (despite having PWMs with higher information content). This study gives further weight to earlier investigations into TF copy numbers that suggest a significant proportion of molecules are not bound to the DNA.