pyranges functions like overlap and intersect are great for selecting subsets of ranges that overlap, but I need to keep track of _which_ ranges overlap. I can think of two ways to accomplish this (they both have analogies to GRanges, and I'd be happy if I could find something in pyranges that does either A or B ):
A) one approach would be analogous to GRangesLists; let's say I have the following two objects:
pr1: ||------|<--1-->|-----|<-2->||<---3--->|-----------|<--4-->|----------||
pr2: ||-------|<--------- 1 ------->|---|<-------- 2 ----->|---------<-3->-||
and pr2.List_like_overlap(pr1)
would return a dict() of Pyranges objects:
>>> pr2.List_like_overlap(pr1)
["1"]
1
2
3
["2"]
3
4
["3"]
#empty
Or something like that (note that pr1.3
appears in the sets for _both_ pr2.1
and pr2.2
.) The goal would then be to average over the pr1 entries 1
, 2
, and 3
and assign the result to pr2.1
(likewise for pr1.3,4
-->pr2.2
)
B) The alternative would be something like pr2.findOverlapPairs(pr1)
--in this case I would simply get back pairs of integers (e.g. tuples, or whatever) telling me the pairs of indices from self
and other that overlap:
>>> pr2.findOverlapPairs(pr1)
(1,1)
(1,2)
(1,3)
(2,3)
(2,4)
I could take it from there and just grab entries with the appropriate indices from either object. Perhaps this approach is less overhead
Or perhaps there's some other solution I'm not seeing, but I hope it's clear what I'm looking for. Can anyone suggest a function that behaves like either pr.List_like_overlap()
or pr.findOverlapPairs()
Edited for clarity