I am running genomic ranges to analyse genomic segment enrichment. The first three columns in my dataset are: chr, start, end, followed by 3 additional metadata columns. All the columns are separated by tabs.
I have successfully run subsetByOverlaps(cases, controls, type="within", invert="true"). According to here, my output should be genomic segments within my chromosome start and end points, as well as being exclusive to my cases. Conversely, I also ran subsetByOverlaps(controls, cases, type="within, invert="true") to look for segments exclusive to controls. I then looked for segments that are found in both by removing the invert option. In a certain instance my queryLength was approximately 4000 segments and subject length 200 odd segments. Given the size of my queryLength, if I run subsetByOverlaps(cases, controls, type="within") I get more than 200 segments in granges object. Am I missing something with respect to the behaviour of the function, since I expected my output to be less than 200 segments assuming that the segments are treated as sets?
The second question is, if I then swap the cases and controls to run subsetByOverlaps(controls, cases, type="within"), how can I combine the data from the 2 runs? Finally, am I correct to assume that combining the two in a dataframe would give me the equivalent of the union of genomic segments found within my cases and controls? If not, is there a way to use Granges to obtain that union without doing it in 2 steps?