Hello Everybody,
I am aware that there are similar questions on testing for significant overlap between lists of genes or chip-seq data but I think that my question is different enough to warrant its own post.
I have two lists of genomic regions. One list contains regions of fixed size - all are 100kb. The other list has regions of variable size, ranging from a few to several megabases. I want to test if the overlap of these regions is more than is expected by chance. I am not sure on the best way to calculate this or if scripts/software already exist.
One could ask if there is any overlap at all between the lists (qualitative eg two regions either overlap or they do not), or the extent of the overlap between regions (quantitative eg two regions overlap by 3000bp). At the moment I am most interested in the former qualitative overlap and would like to know if it is more than expected by chance.
The best method I can think of at the moment is random permutations of regions of the same size. However I think these permutations would have to take place in a constrained space equal to the genome of the species I am testing otherwise the probability of overlap by chance cannot be correctly estimated. I think such a task is beyond my programming abilities so if someone can point out an existing script or suggest an alternative solution that would be most appreciated.
Thanks in advance!
Best,
Rubal
There's a good list of tools and a few detailed answers in this other question