I have a number of genomic ranges datasets with metadata. I want to create a large master set which contains information about ranges that overlap in each dataset, with gaps when there is no overlapping range. However, some datasets might have more than one range that overlaps. The problem is that then when you overlap the next dataset, it starts repeating the data. For example: Dataset 1:
chr2L 13550276 13551760
Dataset 2:
chr2L 13550975 13551760
chr2L 13550276 13550808
Dataset 3:
chr2L 13550975 13551734
chr2L 13550304 13550803
The behavior that both join and plyranges does is as follows:
chr2L 13550276 13551760 chr2L 13550975 13551760 chr2L 13550304 13550803
chr2L 13550276 13551760 chr2L 13550975 13551760 chr2L 13550975 13551734
chr2L 13550276 13551760 chr2L 13550276 13550808 chr2L 13550975 13551734
chr2L 13550276 13551760 chr2L 13550276 13550808 chr2L 13550304 13550803
I want to make it so once a range has been pulled out of the bag it canned be pulled again, so that the output looks more like this:
chr2L 13550276 13551760 chr2L 13550975 13551760 chr2L 13550975 13551734
chr2L 13550276 13551760 chr2L 13550276 13550808 chr2L 13550304 13550803
Another way of thinking about it would be if column 1 is the master overlap and the other columns do not need to be overlapped to each other, just to column 1.
Any ideas?