Question

Get indices from GenomicRanges reduce function

0

Entering edit mode

6.8 years ago

User 7754 ▴ 290

Hi,

Is there any way to extract only the indices of rows that would be reduced by the "reduce" function in GenomicRanges?

Thank you

GenomicRanges reduce R • 3.2k views

ADD COMMENT • link updated 6.8 years ago by Nicolas Rosewick 11k • written 6.8 years ago by User 7754 ▴ 290

score 2 · Answer 1 · 2018-09-25

An easy way around would be to use findOverlaps() (which returns the indices of the ranges that overlap between two Granges objects) and use

findOverlaps(g1, g1)

where g1 is the Granges object for which you want to find the indices of the ranges that can be reduced.

In full code:

> test <- data.frame(rep(1, 5), seq(1000, 5000, 1000), c(1500, 3000, 5000, 5000, 5000))
> colnames(test) <- c("Seqnames", "Start", "End")
> test <- makeGRangesFromDataFrame(test)
> test
GRanges object with 5 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]        1 1000-1500      *
  [2]        1 2000-3000      *
  [3]        1 3000-5000      *
  [4]        1 4000-5000      *
  [5]        1      5000      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> test <- findOverlaps(test, test)
> test
Hits object with 13 hits and 0 metadata columns:
       queryHits subjectHits
       <integer>   <integer>
   [1]         1           1
   [2]         2           2
   [3]         2           3
   [4]         3           2
   [5]         3           3
   ...       ...         ...
   [9]         4           4
  [10]         4           5
  [11]         5           3
  [12]         5           4
  [13]         5           5
  -------
  queryLength: 5 / subjectLength: 5
> indices <- unique(queryHits(test)[queryHits(test) != subjectHits(test)])
> indices
[1] 2 3 4 5

score 1 · Answer 2 · 2018-09-25

That should work. The idea is to find perfect overlaps between the gr and its reduced form. And then to remove these entries from the original gr.

library(GenomicRanges)

x <- data.frame(chr=c("chr1","chr1","chr2","chr3","chr3"),start=c(1,5,10,10,15),end=c(10,15,20,20,30))

#> x
#   chr start end
#1 chr1     1  10
#2 chr1     5  15
#3 chr2    10  20
#4 chr3    10  20
#5 chr3    15  30

gr <- makeGRangesFromDataFrame(x, seqnames.field = "chr",start.field = "start",end.field = "end")

# > gr
# GRanges object with 5 ranges and 0 metadata columns:
#      seqnames    ranges strand
#         <Rle> <IRanges>  <Rle>
#  [1]     chr1  [ 1, 10]      *
#  [2]     chr1  [ 5, 15]      *
#  [3]     chr2  [10, 20]      *
#  [4]     chr3  [10, 20]      *
#  [5]     chr3  [15, 30]      *
#  -------
#  seqinfo: 3 sequences from an unspecified genome; no seqlengths

gr[-queryHits(findOverlaps(gr,reduce(gr),type="equal"))]

# > gr[-queryHits(findOverlaps(gr,reduce(gr),type="equal"))]
# GRanges object with 4 ranges and 0 metadata columns:
#      seqnames    ranges strand
#         <Rle> <IRanges>  <Rle>
#  [1]     chr1  [ 1, 10]      *
#  [2]     chr1  [ 5, 15]      *
#  [3]     chr3  [10, 20]      *
#  [4]     chr3  [15, 30]      *
#  -------
#  seqinfo: 3 sequences from an unspecified genome; no seqlengths