I have two GRanges objects. I would like to calculate distances between specific items in one to specific items in the other. For example, I have genes and peaks and I want to get distances between them. Is there a good way to do that?
There is GenomicRanges::distance
, but that expects a single range. I tried using that and it works fine for individual pairs of ranges. However, iterating through all the combinations takes a really long time. Using apply
or multi-threaded foreach
is still slow (more than a day for a million pairs). This can't be the proper way.
I am familiar with GenomicRanges::distanceToNearest
and that works when you are comparing two GRanges objects, but it only returns the nearest hit.
So is there an efficient way to determine distances between items in two GRanges?
Interesting question- don't know if I know the answer. I understand you don't want the distance from all genes to all peaks but only a subset of them (?). Could you add a minimum example with what you have tried to have a better idea of what you want?
I want distances between specific peaks and genes. For example, distance between each peak and all nearby genes (genes within a certain region). I have specific peak-gene pairs I am interested in.
I ended up solving this by taking my data frame with the peak and gene pairs and adding to it positions for peaks (subsetting peaks GR to peaks col) and then positions for genes (subsetting genes GR to genes col). Then I could do some if-else statements to calculate the distance in the right orientation. All of that is vectorized, so it's essentially instant. However, it feels like a poor hack. I would think GenomicRanges has something like that built it.