Hello,
I need to calculate for my BED file with ChIP-seq peaks fold-enrichment of genomic regions containing or intersecting with promoters (versus genome average). How to do this?
I have tried GENOMATIX, which has this option, but it seems that it calculates fold-enrichment incorrectly (does not take into account different lengths of my genomic regions). I have also tried GREAT and CHIP-ENRICH, but what they both calculate is something like the distribution from each peak's midpoint to the nearest gene, which is not what I need.
Do you know any software to do this task?
PS. I do not need to calculate the read enrichment. I need to calculate the enrichment of peaks contained in my BED file, which intersect with promoters, versus the amount of peaks of this size which would intersect with promoters by chance based on genome-average probability to encounter a promoter.
Like this: My reads contain XX% of reads intersecting with promoters. It is expected that one would get YY% of reads of this length intersecting with promoters by chance, and therefore, the enrichment of promoters is equal to XX/YY-fold in comparison with what one would get by chance. So, I am looking for the solution of this problem. I am pretty sure it is realized already in some software, please advise where!
Thanks
Nice suggestions, Ryan. @brentp and I have been discussion the IntervalStats approach from Chikina and Troyanskaya, as I have been interested in efficient ways to calculate the denominator of their P-value that is computed for each interval. Because I am lazy, how efficient is the existing implementation? Have you tried to implement this in pybedtools?
I haven't tried to implement in pybedtools yet. It's been a while since I've run it, but I remember that one pairwise comparison took about as long as a 1000 iteration permutation test in pybedtools. I haven't looked closely at the actual implementation though, so I have no idea if/how much it can be improved.
Also, GAT has been suggested as another approach that uses simulation:
Thanks, didn't know about GAT. I added it to the list in the answer, and I'll have to try it out.
thanks for plugging poverlap. it needs some love, but it is pretty flexible and can handle a lot of different null models. I wrote it while we were working on the paper you linked to.
@Ryan, Thank you very much for such an extensive list! I will try bedtools jaccard first.
@Aaron, Is there a way to call promoters for a given genome through bedtools, or do I need to create a file with promoters myself?