Calculate distances between items in different GRanges
2
2
Entering edit mode
7.9 years ago
igor 13k

I have two GRanges objects. I would like to calculate distances between specific items in one to specific items in the other. For example, I have genes and peaks and I want to get distances between them. Is there a good way to do that?

There is GenomicRanges::distance, but that expects a single range. I tried using that and it works fine for individual pairs of ranges. However, iterating through all the combinations takes a really long time. Using apply or multi-threaded foreach is still slow (more than a day for a million pairs). This can't be the proper way.

I am familiar with GenomicRanges::distanceToNearest and that works when you are comparing two GRanges objects, but it only returns the nearest hit.

So is there an efficient way to determine distances between items in two GRanges?

R bioconductor granges • 7.8k views
ADD COMMENT
0
Entering edit mode

Interesting question- don't know if I know the answer. I understand you don't want the distance from all genes to all peaks but only a subset of them (?). Could you add a minimum example with what you have tried to have a better idea of what you want?

ADD REPLY
0
Entering edit mode

I want distances between specific peaks and genes. For example, distance between each peak and all nearby genes (genes within a certain region). I have specific peak-gene pairs I am interested in.

I ended up solving this by taking my data frame with the peak and gene pairs and adding to it positions for peaks (subsetting peaks GR to peaks col) and then positions for genes (subsetting genes GR to genes col). Then I could do some if-else statements to calculate the distance in the right orientation. All of that is vectorized, so it's essentially instant. However, it feels like a poor hack. I would think GenomicRanges has something like that built it.

ADD REPLY
2
Entering edit mode
6.4 years ago
HectorH ▴ 20

Just a remark: The question seems to be "How to calculate all distances between different GRanges".

Indeed, GenomicRanges::distance expects a single range. However, using the argument select="all", it will output all distances between ONE range from the 1st GRanges and ALL ranges from the 2nd GRanges object.

ADD COMMENT
1
Entering edit mode
3.0 years ago

5 years late, but I need to do this too. Here is what I do:

I have gr1 and gr2, which are two GRanges objects of the same length. I want to calculate the distance between the corresponding elements of each.

For ranges in gr1 that are upstream of their corresponding ranges in gr2, distance is:

start(gr1) - end(gr2) - 1

For ranges in gr1 that are downstream of their corresponding ranges in gr2, distance is:

start(gr2) - end(gr1) - 1

We do not need to know whether each element of gr1 is upstream or downstream of gr2. You can tell by the sign, + or -. So, simply take the pmax of these calculations.

upstream <- start(gr1) - end(gr2) 
downstream <- start(gr2) - end(gr1) 
distance <- pmax(upstream, downstream)

If two ranges overlap, both upstream and downstream will be negative. So replace these with 0.

distance[distance < 0] <- 0
ADD COMMENT

Login before adding your answer.

Traffic: 1998 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6