Question

GenomicRanges: Limit overlaps to one occurrence

0

Entering edit mode

22 months ago

sorrymouse ▴ 120

I have a number of genomic ranges datasets with metadata. I want to create a large master set which contains information about ranges that overlap in each dataset, with gaps when there is no overlapping range. However, some datasets might have more than one range that overlaps. The problem is that then when you overlap the next dataset, it starts repeating the data. For example: Dataset 1:

chr2L   13550276    13551760

Dataset 2:

chr2L   13550975    13551760
chr2L   13550276    13550808

Dataset 3:

chr2L   13550975    13551734
chr2L   13550304    13550803

The behavior that both join and plyranges does is as follows:

chr2L   13550276    13551760    chr2L   13550975    13551760    chr2L   13550304    13550803
chr2L   13550276    13551760    chr2L   13550975    13551760    chr2L   13550975    13551734
chr2L   13550276    13551760    chr2L   13550276    13550808    chr2L   13550975    13551734
chr2L   13550276    13551760    chr2L   13550276    13550808    chr2L   13550304    13550803

I want to make it so once a range has been pulled out of the bag it canned be pulled again, so that the output looks more like this:

chr2L   13550276    13551760    chr2L   13550975    13551760    chr2L   13550975    13551734
chr2L   13550276    13551760    chr2L   13550276    13550808    chr2L   13550304    13550803

Another way of thinking about it would be if column 1 is the master overlap and the other columns do not need to be overlapped to each other, just to column 1.

Any ideas?

R GenomicRanges • 377 views

ADD COMMENT • link 22 months ago by sorrymouse ▴ 120