Question

How can I compute if the co-occurrence of two TFBSs within a certain width in all promoters is larger than by chance?

0

Entering edit mode

6.5 years ago

JJ ▴ 710

Hi,

I have computed the number of co-occurrences of two TFBSs in all promoters in the human genome.

Previously, we have discussed how to calculate if the co-occurring of two TFBSs is higher than one would expect by chance, which can be done with a hypergeometric distribution using the principle of overlapping lists.

But now I am wondering how I can compute if the co-occurrence of two TFBSs within a certain width in all promoters (or even the whole genome tiled in bins) is higher than chance? - so let's say within 100 bp of one another in all promoters (or even genome). So these are then a subset of the co-occuring ones.

I would reckon that this is more informative than just evaluating the co-occurence in general, as TFBSs close to each other might indicate that the TFs that bind to them are more likely to act synergistically. Any ideas how to handle this statistically?

My first thought was randomising - so downloading all TFs matrices and computing the co-occurance and co-occurance within a certain width for a number of random combinations of two TFs. Could I be on the right track here? Could I then do multiple Fisher Exact tests? something like this?

my TFs
# co-occuring in promoters not within length l    
# co-occuring in promoters within length l  
random Combinations
# co-occuring in promoters not within length l    
# co-occuring in promoters within length l

and then pool the p values somehow? Or is there an easier solution? I am grateful for any input!!!

Thanks,

sequence genome • 1.8k views

ADD COMMENT • link updated 6.5 years ago by dariober 15k • written 6.5 years ago by JJ ▴ 710

1

Entering edit mode

So these are then a subset of the co-occuring ones.

If you have a set and a subset, you can think about the set as "background" and the subset as what you are interested in or are observing, and you could apply the hypergeometric or Fisher's Exact test to those two sets.

ADD REPLY • link 6.5 years ago by Alex Reynolds 35k

0

Entering edit mode

Thanks for your input. So how do I go about the multiple sets of random combinations of TFs? I have the set I am interested in and n number of random combinations of TFs - for each combination I have a set and subset. Do I average the values for set and subset beforehand and then do one Fisher Exact test or do I do multiple tests and then average the p value?

ADD REPLY • link 6.5 years ago by JJ ▴ 710

score 1 · Answer 1 · 2018-05-15

1

Entering edit mode

6.5 years ago

dariober 15k

I'm not sure I fully understand your question but maybe GAT and/or bedtools reldist could do the job...

ADD COMMENT • link 6.5 years ago by dariober 15k

0

Entering edit mode

Thank you for the links to those tools! They provide a different way of approaching the problem. However, if I have evaluated only the promoter regions for TFBSs wouldn't they automatically produce a positive result since all hits are at least as close as the length of the promoter region I have evaluated? Would they only work on a genome-wide scale?

ADD REPLY • link 6.5 years ago by JJ ▴ 710