Hi All, I performed a transcription factor binding site enrichment analysis in R. Briefly, I have a co-regulated pathway in a disease state, and I wanted to see if there were any TFs that were significantly enriched within the promoter sequences of the genes in this pathway. So I took 5000 bp upstream of the TSS of each gene, and scanned it for binding sites for the 68 human motifs found in the Jaspar database. I used hypergeometric testing against a background of 20,000 genes (5000 bp upstream also) and found 4 of the 68 TFs to be enriched. My question is, what can I do with this information next? I know that is a probably a stupidly broad question, but I'm at a bit of a loss. I thought about trying to correlate the number of TFBS to gene expression level, but the regulation architecture is probably far more complex as to make that relationship meaningless. I have access to plent of gene expression data, through various GEO datasets.
I also thought about trying to see if there were any other pathways in which these TFBSs were enriched but I'm not really sure how I would go about doing that without specifically picking out pathways in mind. I was hoping for something a little more related to discovery than to simple hyopthesis testing.
Any ideas?
Any help is appreciated, and sorry for the probably all too vague question.
Cheers,
Davy
Great answer!!! In regards to your second suggestion. Have any pointers on how one might go about this? A paper perhaps where this was done. Thanks again!
As the co-regulation would be a prediction that you'd need to substantiate, I can offer 2 ideas. 1) look in the literature to see if there are reports of 2 TFs co-regulating genes in your system. Those papers can be your refs for your manuscript. 2) Perform the expts yourself. If you don't want to or cannot do those expts, then refer to the literature for your set of TFs.