Hi,
I have computed the number of co-occurrences of two TFBSs in all promoters in the human genome.
Previously, we have discussed how to calculate if the co-occurring of two TFBSs is higher than one would expect by chance, which can be done with a hypergeometric distribution using the principle of overlapping lists.
But now I am wondering how I can compute if the co-occurrence of two TFBSs within a certain width in all promoters (or even the whole genome tiled in bins) is higher than chance? - so let's say within 100 bp of one another in all promoters (or even genome). So these are then a subset of the co-occuring ones.
I would reckon that this is more informative than just evaluating the co-occurence in general, as TFBSs close to each other might indicate that the TFs that bind to them are more likely to act synergistically. Any ideas how to handle this statistically?
My first thought was randomising - so downloading all TFs matrices and computing the co-occurance and co-occurance within a certain width for a number of random combinations of two TFs. Could I be on the right track here? Could I then do multiple Fisher Exact tests? something like this?
my TFs
# co-occuring in promoters not within length l
# co-occuring in promoters within length l
random Combinations
# co-occuring in promoters not within length l
# co-occuring in promoters within length l
and then pool the p values somehow? Or is there an easier solution? I am grateful for any input!!!
Thanks,
If you have a set and a subset, you can think about the set as "background" and the subset as what you are interested in or are observing, and you could apply the hypergeometric or Fisher's Exact test to those two sets.
Thanks for your input. So how do I go about the multiple sets of random combinations of TFs? I have the set I am interested in and n number of random combinations of TFs - for each combination I have a set and subset. Do I average the values for set and subset beforehand and then do one Fisher Exact test or do I do multiple tests and then average the p value?