The Functional Consequences of Variation in Transcription Factor Binding

The Functional Consequences of Variation in Transcription Factor Binding
Darren A. Cusanovich, Bryan Pavlovic, Jonathan K. Pritchard, Yoav Gilad
(Submitted on 18 Oct 2013)

One goal of human genetics is to understand how the information for precise and dynamic gene expression programs is encoded in the genome. The interactions of transcription factors (TFs) with DNA regulatory elements clearly play an important role in determining gene expression outputs, yet the regulatory logic underlying functional transcription factor binding is poorly understood. Many studies have focused on characterizing the genomic locations of TF binding, yet it is unclear to what extent TF binding at any specific locus has functional consequences with respect to gene expression output. To evaluate the context of functional TF binding we knocked down 59 TFs and chromatin modifiers in one HapMap lymphoblastoid cell line. We then identified genes whose expression was affected by the knockdowns. We intersected the gene expression data with transcription factor binding data (based on ChIP-seq and DNase-seq) within 10 kb of the transcription start sites of expressed genes. This combination of data allowed us to infer functional TF binding. On average, 14.7% of genes bound by a factor were differentially expressed following the knockdown of that factor, suggesting that most interactions between TF and chromatin do not result in measurable changes in gene expression levels of putative target genes. We found that functional TF binding is enriched in regulatory elements that harbor a large number of TF binding sites, at sites with predicted higher binding affinity, and at sites that are enriched in genomic regions annotated as active enhancers.

9 thoughts on “The Functional Consequences of Variation in Transcription Factor Binding

  1. This paper is looking at the effect of reducing transcription factor level on gene expression levels. The main conclusion is that on average only 15% of genes bound were differentially expressed suggesting that transcription factor binding do not always results in measurable changes in gene expression levels.

    I found this work very interesting expanding at a larger scale previous results.

    However, in the 59 factors that were Knocked Down in the present paper, there are genuine Transcription factors (IRF3, JUND, PAX5, …..) but also General Transcriptions factors (GTFs) such as GTF2B, TAF1 and cofactors such as the HAT p300 or the Double-strand-break repair protein rad21.

    I am woondering if comparing the results by ‘categories’ may be more ‘informative’ ? In my knowledge some of the targeted proteins do not have DNA binding activity per se thus the effect of KD may have a general effect on gene expression rather than a specific one.

    Alexis Verger

  2. Alexis,
    Thanks for your comments! I agree that there is probably something to be gained by considering the data separately for the different categories of factors, as you suggest. However, the analyses we described in the manuscript were actually relatively consistent across these categories (TFs, GTFs, and cofactors/chromatin modifiers) – both in terms of the disparity between binding and differential expression AND in terms of the genomic annotations associated with binding near differentially expressed genes. Nonetheless, I think there are a lot of avenues left to explore with these data! Thanks again for your thoughtful comment.

    Best,
    Darren Cusanovich

  3. Very cool! To me, the most interesting bit would be to know what proportion of the binding events classed as non-functional here are truly noise (in any context) and what proportion of them would have significance in another cell type and another time with other necessary TFs present (that is to say, to what extent these results are just a reflection of regulation being combinatorial and thus exactly what you’d expect).

  4. Very interesting work, Darren.

    A few quick questions/thoughts:
    1. How much of the apparently non-functional binding is happening at so-called HOT regions? I’d be curious to see a modification of your figure 4A, where you quantify the fraction of functional binding vs. the number of factors bound at the locus. My guess: binding sites with very few and very many are less likely to be functional, while binding sites with an intermediate number of overlapping sites are more likely to be functional.

    2. I always worry about thresholding issues when making comparisons like these (in my own work too). Have you investigated other methods of defining your bound gene set (intersection vs. union of peak calls, more stringent thresholds) or differentially expressed gene set (Storey’s Pi1)? Or something else?

    3. I’d like to see a little more detail on the issues related to motif scores, conservation, and functionality. Obviously, resolution is an issue when thinking about conservation within ChIP peaks. Do the conservation scores for positions within the TF motifs differ? Is there a peak of conservation at the peak center and does that differ? What fraction of peaks with high scoring motifs is functional (you say there is an association, but a summary percentage would be of interest)?

    4. The examination of the informativeness of different features (e.g., chromHMM) is neat. But have you examined methods that combine multiple different features, for example chromHMM state, conservation, and motif? Either in the hierarchical model framework of JP et al or with a classifier approach?

    • First of all, thanks for your thoughts, Casey. I’m glad you enjoyed the manuscript and I appreciate the thoughtful comments. I’m sorry for taking so long to respond. I’ve had a crazy few weeks and I wanted to take some time to consider your comments carefully. Here are my responses:

      1. I thought the paper on “HOT” regions was very interesting, however, I don’t think these regions are an issue for my analysis. If I order the experiments by the fraction of bound genes that are DE in the knockdown (as you suggested), you can see that experiments with higher fractions of functional binding have a clear trend towards more overall binding (functional or not), while within each experiment there tends to be marginally more binding at differentially expressed genes (http://dx.doi.org/10.6084/m9.figshare.858780). Another way of looking at this is to bin the genes by the number of binding events and ask what fraction of genes in each bin is functionally bound in one or more experiments. Here too, the trend is that more overall binding leads to more functional binding. There’s no apparent drop off for the bins with the most overall binding (http://dx.doi.org/10.6084/m9.figshare.858781). Perhaps our regulatory windows are not capturing many of the “HOT” regions. The median number of binding events in a 20kb window for an expressed gene based on our definitions is only 45. It’s also not clear to me how the “HOT” regions relate to human TSSs, given the fact that the experiments were conducted in yeast and the fact that the authors apparently used PolII ChIP-seq, rather than RNA-seq or CAGE-seq to define highly expressed loci.

      2. I worry about thresholding too. I tried a couple different binding definitions including just aggregating binding data, taking the union of all binding events for each factor, using the midpoint or the actual peak size – all of these seemed to lead to the same conclusions. I also, used several different window sizes (1-20kb) for defining bound genes. For differential expression, I did not use the Storey method, but I did look at a relaxed DE threshold. The results seemed pretty robust to all of these different thresholds.

      3. I agree that ChIP peaks can cause some issues in defining conservation and PWM scores, but in the section of our manuscript dealing with PhastCons and PWM score we only considered the binding data from the DNase-seq experiments. These binding events are defined at specific instances of the binding motif and so are only ~10bp. I believe this should avoid most of the problems you describe.

      4. I did not model the different factors jointly using either a hierarchical or classifier approach. It’s worth considering though. Let me give it some more thought…

  5. Pingback: Most viewed on Haldane’s Sieve: November 2013 | Haldane's Sieve

  6. Pingback: Sifting through 2013 with Haldane’s Sieve | Haldane's Sieve

  7. Pingback: Author post: VSEAMS: A pipeline for variant set enrichment analysis using summary GWAS data identifies IKZF3, BATF and ESRRA as key transcription factors in type 1 diabetes | Haldane's Sieve

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s