I want to do permutation test: randomly reposit (shuffle) given genomic intervals and measure intersection between new coordinates and specific genomic element.
Example:
- Different sets of genes: protein coding, pseudogenes, ncRNA - intervals that I want to shuffle;
Genomic repeat L1 - coordinates are stable. - For every gene set shuffle intervals, intersect and measure the overlap with L1 (I am using bedtools shuffle - "reposition each feature in the input BED file on a random chromosome at a random position").
Question - Which genomic regions to exclude from the "genome" (bedtools shuffle -g
option) before shuffling gene intervals?
I was going to exclude gaps in the assembly.
But what about:
- All gene regions.
If I am shuffling pseudogene intervals should I exclude protein coding and ncRNA coordinates? - All non L1 Repeat masker coordinates.
As alu, LTR and DNA transposons aren't L1 so their won't be any intersection with them?
I am a bit confused about what you are trying to do here. You want to pick genomic coordinates at random (do you mean intervals? coordinates are a fixed point, intervals require two points) and see if they overlap with repeats (L1)? In your example it seems like you have several types of genomic elements and you are going to pick some at random and see if they overlap repeats? What do you mean by shuffling? Are you going to be keeping the width of each element the same and shift them around the genome and you want to exclude all functional regions?
I edited my question.