How can I compute if the co-occurrence of two TFBSs within a certain width in all promoters is larger than by chance?
1
0
Entering edit mode
6.5 years ago
JJ ▴ 710

Hi,

I have computed the number of co-occurrences of two TFBSs in all promoters in the human genome.

Previously, we have discussed how to calculate if the co-occurring of two TFBSs is higher than one would expect by chance, which can be done with a hypergeometric distribution using the principle of overlapping lists.

But now I am wondering how I can compute if the co-occurrence of two TFBSs within a certain width in all promoters (or even the whole genome tiled in bins) is higher than chance? - so let's say within 100 bp of one another in all promoters (or even genome). So these are then a subset of the co-occuring ones.

I would reckon that this is more informative than just evaluating the co-occurence in general, as TFBSs close to each other might indicate that the TFs that bind to them are more likely to act synergistically. Any ideas how to handle this statistically?

My first thought was randomising - so downloading all TFs matrices and computing the co-occurance and co-occurance within a certain width for a number of random combinations of two TFs. Could I be on the right track here? Could I then do multiple Fisher Exact tests? something like this?

my TFs
# co-occuring in promoters not within length l    
# co-occuring in promoters within length l  
random Combinations
# co-occuring in promoters not within length l    
# co-occuring in promoters within length l

and then pool the p values somehow? Or is there an easier solution? I am grateful for any input!!!

Thanks,

sequence genome • 1.8k views
ADD COMMENT
1
Entering edit mode

So these are then a subset of the co-occuring ones.

If you have a set and a subset, you can think about the set as "background" and the subset as what you are interested in or are observing, and you could apply the hypergeometric or Fisher's Exact test to those two sets.

ADD REPLY
0
Entering edit mode

Thanks for your input. So how do I go about the multiple sets of random combinations of TFs? I have the set I am interested in and n number of random combinations of TFs - for each combination I have a set and subset. Do I average the values for set and subset beforehand and then do one Fisher Exact test or do I do multiple tests and then average the p value?

ADD REPLY
1
Entering edit mode
6.5 years ago

I'm not sure I fully understand your question but maybe GAT and/or bedtools reldist could do the job...

ADD COMMENT
0
Entering edit mode

Thank you for the links to those tools! They provide a different way of approaching the problem. However, if I have evaluated only the promoter regions for TFBSs wouldn't they automatically produce a positive result since all hits are at least as close as the length of the promoter region I have evaluated? Would they only work on a genome-wide scale?

ADD REPLY

Login before adding your answer.

Traffic: 1763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6