Hi all,
I have detected some interesting epigenetic regions on the genome (for example, differentially methylated regions) and I have some features on the genome (for example, I classify all the genome to 3'UTR/5'UTR/gene body/promoter/enhancer/junk region).
Different features have different lengths (i.e. junk regions occupy the majority of the genome).
How to say if my set of intervals - diff methylated regions - is enriched in one of the features? One interval can have intersection with several features. I know how to apply a proportion test, I do not know how to normalise number of intersections between features and intervals for the length of features.
Careful with the idea of 'junk' regions. It's thought that methylation is actually majorly involved in the repression of transposable elements, that make up about 50% of the human genome and the vast majority of other genomes as well. It's entirely possible that this junk region is in fact mostly composed of insulator/TEs.
You could try a hyper geometric test to test for enrichment.
Agree with everything, called it "junk" just in order not to specify all 30 options. It is absolutely clear how to use hypergeometric test (i.e. https://www.geneprof.org/GeneProf/tools/hypergeometric.jsp ) while having a control dataset, but I do not understand how to work in terms of features/intervals only, what is "Population", what is "Sample" and what is a success in this case.