Get upstream and downstream intergenic regions in R
1
2
Entering edit mode
8.2 years ago

Hi all,

I would like to get the upstream and downstream intergenic distance between two genes in R (taking into account that some genes may overlap). For example - in the following case I would expect to get a the upstream distance of 5 for gene A, a distance 0 or negative for downstream of A, and a distance of 10 for downstream of B/upstream of C.

-----|____gene A____|-------------------------|____gene_C___|---     + strand
---------------|______gene_B_______|----------------------------     - strand

I tried to use precede() and follow() in GRanges but these two functions completely exclude overlapping genes from the analysis. This sounds like a task everyone is doing and it should be part of a package but I can't find anything appropriate.

Thank you for any hints!

R GRanges intergenic • 3.4k views
ADD COMMENT
0
Entering edit mode

I think your problem might be similar to that of finding the distance between motifs asked in this question. My answer there could be a starting point.

ADD REPLY
0
Entering edit mode

OK on second thought I see you want to obtain a negative distance for overlapping genes (to tell the amount of overlapping I guess). I am not sure how can you get this... (might think about it later).

ADD REPLY
4
Entering edit mode
8.2 years ago
ddiez ★ 2.0k

Here is some attempt. I create an example ranges (IRanges for simplicity). You can used this modified version of plotRanges() function (from the IRangesOverview vignette in the IRanges package) to visualize the ranges and how they overlap.

The problem using follow() and precede() is that they ignore overlapping ranges. Therefore here instead compute the sequence-wide comparison of distances. Maybe this needs to be modified to be more efficient and only do this in the closest sequence. However, with overlapping sequences it is not so clear to me what closest means.

library(IRanges)
foo <- IRanges(start = c(1, 7, 10, 5, 13), end = c(4, 12, 15, 8, 14))

plotRanges <- function(x,
                       xlim = x,
                       main = deparse(substitute(x)),
                       col = "black",
                       sep = 0.5,
                       ...)

{
  height <- 1
  if (is(xlim, "Ranges"))
    xlim <-
      c(min(start(xlim)), max(end(xlim)))
  bins <-
    disjointBins(IRanges(start(x), end(x) + 1))
  plot.new()
  plot.window(xlim, c(0, max(bins) * (height + sep)))
  ybottom <-
    bins * (sep + height) - height
  rect(start(x), ybottom, end(x), ybottom + height, col = col, ...)
  text(c(start(x) + width(x)/2) - .5, y = ybottom + height /2)
  title(main)
  r <- range(x)
  axis(1, at = seq(start(r), end(r)))
}

plotRanges(foo, col = rainbow(length(foo)))
foo
IRanges object with 5 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1         4         4
  [2]         7        12         6
  [3]        10        15         6
  [4]         5         8         4
  [5]        13        14         2

I compute the sequence-wise comparison as this:

# comparison matrix.
combmatrix <- t(combn(length(foo), 2))
combmatrix
colnames(combmatrix) <- c("i", "j")
      i j
 [1,] 1 2
 [2,] 1 3
 [3,] 1 4
 [4,] 1 5
 [5,] 2 3
 [6,] 2 4
 [7,] 2 5
 [8,] 3 4
 [9,] 3 5

Then iterate over the comparison matrix and compute the distances:

hoo <- apply(combmatrix, 1, function(x) {
  s1 <- foo[x[1], ]
  s2 <- foo[x[2], ]
  if (start(s1) < start(s2))
    start(s2) - end(s1)
  else
    start(s1) - end(s2)
})

cbind(combmatrix, distance = hoo)
      i j distance
 [1,] 1 2        3
 [2,] 1 3        6
 [3,] 1 4        1
 [4,] 1 5        9
 [5,] 2 3       -2
 [6,] 2 4       -1
 [7,] 2 5        1
 [8,] 3 4        2
 [9,] 3 5       -2
[10,] 4 5        5

Note that for the comparison between sequences 3 and 5, it is not clear to me what the right answer would be in this case. I guess this is the reason distances to overlapping ranges are zero in IRanges (and GRanges).

ADD COMMENT
0
Entering edit mode

I'll mark your answer as accepted since I've done something similar in the end; but it also answers my original question - namely, there is no such function or parameter in Granges. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2902 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6