Get indices from GenomicRanges reduce function
2
0
Entering edit mode
6.2 years ago
User 7754 ▴ 270

Hi,

Is there any way to extract only the indices of rows that would be reduced by the "reduce" function in GenomicRanges?

Thank you

GenomicRanges reduce R • 2.8k views
ADD COMMENT
2
Entering edit mode
6.2 years ago
thomaskuilman ▴ 850

An easy way around would be to use findOverlaps() (which returns the indices of the ranges that overlap between two Granges objects) and use

findOverlaps(g1, g1)

where g1 is the Granges object for which you want to find the indices of the ranges that can be reduced.

In full code:

> test <- data.frame(rep(1, 5), seq(1000, 5000, 1000), c(1500, 3000, 5000, 5000, 5000))
> colnames(test) <- c("Seqnames", "Start", "End")
> test <- makeGRangesFromDataFrame(test)
> test
GRanges object with 5 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]        1 1000-1500      *
  [2]        1 2000-3000      *
  [3]        1 3000-5000      *
  [4]        1 4000-5000      *
  [5]        1      5000      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> test <- findOverlaps(test, test)
> test
Hits object with 13 hits and 0 metadata columns:
       queryHits subjectHits
       <integer>   <integer>
   [1]         1           1
   [2]         2           2
   [3]         2           3
   [4]         3           2
   [5]         3           3
   ...       ...         ...
   [9]         4           4
  [10]         4           5
  [11]         5           3
  [12]         5           4
  [13]         5           5
  -------
  queryLength: 5 / subjectLength: 5
> indices <- unique(queryHits(test)[queryHits(test) != subjectHits(test)])
> indices
[1] 2 3 4 5
ADD COMMENT
1
Entering edit mode
6.2 years ago

That should work. The idea is to find perfect overlaps between the gr and its reduced form. And then to remove these entries from the original gr.

library(GenomicRanges)

x <- data.frame(chr=c("chr1","chr1","chr2","chr3","chr3"),start=c(1,5,10,10,15),end=c(10,15,20,20,30))

#> x
#   chr start end
#1 chr1     1  10
#2 chr1     5  15
#3 chr2    10  20
#4 chr3    10  20
#5 chr3    15  30

gr <- makeGRangesFromDataFrame(x, seqnames.field = "chr",start.field = "start",end.field = "end")

# > gr
# GRanges object with 5 ranges and 0 metadata columns:
#      seqnames    ranges strand
#         <Rle> <IRanges>  <Rle>
#  [1]     chr1  [ 1, 10]      *
#  [2]     chr1  [ 5, 15]      *
#  [3]     chr2  [10, 20]      *
#  [4]     chr3  [10, 20]      *
#  [5]     chr3  [15, 30]      *
#  -------
#  seqinfo: 3 sequences from an unspecified genome; no seqlengths

gr[-queryHits(findOverlaps(gr,reduce(gr),type="equal"))]

# > gr[-queryHits(findOverlaps(gr,reduce(gr),type="equal"))]
# GRanges object with 4 ranges and 0 metadata columns:
#      seqnames    ranges strand
#         <Rle> <IRanges>  <Rle>
#  [1]     chr1  [ 1, 10]      *
#  [2]     chr1  [ 5, 15]      *
#  [3]     chr3  [10, 20]      *
#  [4]     chr3  [15, 30]      *
#  -------
#  seqinfo: 3 sequences from an unspecified genome; no seqlengths
ADD COMMENT

Login before adding your answer.

Traffic: 2622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6