Question

Can genomicranges do bedtools-like intersection operations?

0

Entering edit mode

7.0 years ago

endrebak ▴ 980

In bedtools intersect means "for each read in A, for each read in B that overlaps, find the part of A that overlaps with a read in B".

So that if you have 4 reads in A and 6 reads in B and all overlap you get 4*6 results.

GenomicRanges intersect works differently; for both A and B it first clusters overlapping reads into one read and then does the intersection operation. Is it possible to get GenomicRanges intersect to work like bedtools?

Here is the input with the expected output:

head tests/f2.bed tests/f3.bed
==> tests/f2.bed <==
chr1    1   2   f   0   +
chr1    6   7   f   0   -

==> tests/f3.bed <==
chr1    3   6   h   0   +
chr1    4   7   h   0   -
chr1    5   7   h   0   -
chr1    8   9   h   0   +
biocore-home ~/c/pyranges (master DU=) bedtools intersect -a tests/f3.bed -b tests/f2.bed
chr1    6   7   h   0   -
chr1    6   7   h   0   -

R • 4.7k views

ADD COMMENT • link updated 7.0 years ago by bruce.moran ▴ 970 • written 7.0 years ago by endrebak ▴ 980

0

Entering edit mode

Try findOverlaps. You can then use GenomicRanges::reduce on a 'combined' GRanges object made from the combined set of reads (using c() works for that IIRC). With the option revmap=TRUE, you can backtrack to find which rows of the combined GRanges correspond to that region, and so that particular overlap.

If you set up a reprex I can show you how this might work.

ADD REPLY • link 7.0 years ago by bruce.moran ▴ 970

0

Entering edit mode

Okay, thanks. I am not that interested in the actual way of doing it, I was just wondering if it was possible without hand rolling a solution. It seems like most genomicranges operations are set-like.

ADD REPLY • link 7.0 years ago by endrebak ▴ 980

0

Entering edit mode

Great, just submitted an answer to your query

ADD REPLY • link 7.0 years ago by bruce.moran ▴ 970

0

Entering edit mode

Dear, I am sorry to reopen this question, I want to run bedtools intersect like operation on genomicranges in R, but I am actually not clear about the process you mentioned. The input and output I excepted is similar with this question. Could you please give me more information? Best. Zhang