Get indices from GenomicRanges reduce function
2
Hi,
Is there any way to extract only the indices of rows that would be reduced by the "reduce" function in GenomicRanges?
Thank you
GenomicRanges
reduce
R
• 3.2k views
An easy way around would be to use findOverlaps()
(which returns the indices of the ranges that overlap between two Granges
objects) and use
findOverlaps(g1, g1)
where g1 is the Granges
object for which you want to find the indices of the ranges that can be reduced.
In full code:
> test <- data.frame(rep(1, 5), seq(1000, 5000, 1000), c(1500, 3000, 5000, 5000, 5000))
> colnames(test) <- c("Seqnames", "Start", "End")
> test <- makeGRangesFromDataFrame(test)
> test
GRanges object with 5 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 1 1000-1500 *
[2] 1 2000-3000 *
[3] 1 3000-5000 *
[4] 1 4000-5000 *
[5] 1 5000 *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> test <- findOverlaps(test, test)
> test
Hits object with 13 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 1
[2] 2 2
[3] 2 3
[4] 3 2
[5] 3 3
... ... ...
[9] 4 4
[10] 4 5
[11] 5 3
[12] 5 4
[13] 5 5
-------
queryLength: 5 / subjectLength: 5
> indices <- unique(queryHits(test)[queryHits(test) != subjectHits(test)])
> indices
[1] 2 3 4 5
That should work. The idea is to find perfect overlaps between the gr and its reduced form. And then to remove these entries from the original gr.
library(GenomicRanges)
x <- data.frame(chr=c("chr1","chr1","chr2","chr3","chr3"),start=c(1,5,10,10,15),end=c(10,15,20,20,30))
gr <- makeGRangesFromDataFrame(x, seqnames.field = "chr",start.field = "start",end.field = "end")
gr[-queryHits(findOverlaps(gr,reduce(gr),type="equal"))]
Login before adding your answer.
Traffic: 2515 users visited in the last hour