Here is some attempt. I create an example ranges (IRanges for simplicity). You can used this modified version of plotRanges()
function (from the IRangesOverview vignette in the IRanges package) to visualize the ranges and how they overlap.
The problem using follow() and precede() is that they ignore overlapping ranges. Therefore here instead compute the sequence-wide comparison of distances. Maybe this needs to be modified to be more efficient and only do this in the closest sequence. However, with overlapping sequences it is not so clear to me what closest means.
library(IRanges)
foo <- IRanges(start = c(1, 7, 10, 5, 13), end = c(4, 12, 15, 8, 14))
plotRanges <- function(x,
xlim = x,
main = deparse(substitute(x)),
col = "black",
sep = 0.5,
...)
{
height <- 1
if (is(xlim, "Ranges"))
xlim <-
c(min(start(xlim)), max(end(xlim)))
bins <-
disjointBins(IRanges(start(x), end(x) + 1))
plot.new()
plot.window(xlim, c(0, max(bins) * (height + sep)))
ybottom <-
bins * (sep + height) - height
rect(start(x), ybottom, end(x), ybottom + height, col = col, ...)
text(c(start(x) + width(x)/2) - .5, y = ybottom + height /2)
title(main)
r <- range(x)
axis(1, at = seq(start(r), end(r)))
}
plotRanges(foo, col = rainbow(length(foo)))
foo
IRanges object with 5 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 1 4 4
[2] 7 12 6
[3] 10 15 6
[4] 5 8 4
[5] 13 14 2
I compute the sequence-wise comparison as this:
# comparison matrix.
combmatrix <- t(combn(length(foo), 2))
combmatrix
colnames(combmatrix) <- c("i", "j")
i j
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 1 5
[5,] 2 3
[6,] 2 4
[7,] 2 5
[8,] 3 4
[9,] 3 5
Then iterate over the comparison matrix and compute the distances:
hoo <- apply(combmatrix, 1, function(x) {
s1 <- foo[x[1], ]
s2 <- foo[x[2], ]
if (start(s1) < start(s2))
start(s2) - end(s1)
else
start(s1) - end(s2)
})
cbind(combmatrix, distance = hoo)
i j distance
[1,] 1 2 3
[2,] 1 3 6
[3,] 1 4 1
[4,] 1 5 9
[5,] 2 3 -2
[6,] 2 4 -1
[7,] 2 5 1
[8,] 3 4 2
[9,] 3 5 -2
[10,] 4 5 5
Note that for the comparison between sequences 3 and 5, it is not clear to me what the right answer would be in this case. I guess this is the reason distances to overlapping ranges are zero in IRanges (and GRanges).
I think your problem might be similar to that of finding the distance between motifs asked in this question. My answer there could be a starting point.
OK on second thought I see you want to obtain a negative distance for overlapping genes (to tell the amount of overlapping I guess). I am not sure how can you get this... (might think about it later).