In the genomic ranges package for R, is there an option in setdiff
that prevents the collapsing of adjacent ranges? for example, If I have the following:
gr1 = GRanges object
[1] chr1 1-10 *
[2] chr1 11-20 *
[3] chr1 21-30 *
gr2 = GRanges object
[1] chr1 18-25 *
and I take setdiff
:
What I get:
setdiff( gr1, gr2 )
[1] chr1 1-17 * # <-- 1-10 and 11-20 are reduced together automatically; I don't want this.
[2] chr1 26-30 *
The output stops at 17 and picks up again at 26 to avoid gr2 from 18-25, so at least that much is correct, but unfortunately, my output now has a single continuous range from 1 to 17 instead of 1-10, and a different GRange directly adjacent from 11-17. There's a reduce operation being done automatically that I want to suppress.
What I want:
setdiff( gr1, gr2, <Some_option_to_suppress_reduce> )
[1] chr1 1-10 *
[2] chr1 11-17 *
[3] chr1 26-30 *
I don’t want these first two regions to be reduced into one.
What I've tried:
The best solution I've come up with so far is to convert to a list and then back to GRange, like this:
unlist( GRangesList( lapply( 1:length(gr1), function(i) setdiff ( gr1[i], gr2) ) ))
..which does what I want but with all the converting between data types it's really slow and inefficient. Is there an option to turn off reduce directly (or some other more elegant solution)?
Apparently PyRanges does this by default with the function
subtract()
--which, I think is a much more logical default setting for something called 'subtract' (but for a function called 'setdiff()' I can understand this choice of default behavior; it would be nice if there were an equivalent 'subtract' function for GRanges though).https://pyranges.readthedocs.io/en/latest/autoapi/pyranges/index.html#pyranges.PyRanges.subtract
Solution=switch to PyRanges, or use function below.
if you use
GenomicRanges::subtract(gr1, gr2) %>% unlist()
.. you'll get thediff
without reduction