How do I subset a GRanges on chromosome, region and strand?
3
2
Entering edit mode
6.0 years ago
endrebak852 ▴ 110
library(GenomicRanges)

gr=GRanges(seqnames=c("chr1","chr2","chr2"),
           ranges=IRanges(start=c(50,150,200),end=c(100,200,300)),
           strand=c("+","-","-")
)

I want to get all intervals between 200-300 on chromosome 2, the minus strand. How do I do that?

gr[seqnames(gr) == "chr2" & start(gr) > 200 & end(gr) < 300 & strand(gr) == "-"]

does not work as it does not find the overlaps with 200-300, but rather the intervals strictly contained in that range.

GenomicRanges R bioconductor • 19k views
ADD COMMENT
0
Entering edit mode

Read about findOverlaps?

ADD REPLY
1
Entering edit mode

Construct a second GRanges object with the target ranges and then use either findOverlaps as zx8754 suggests, or subsetByOverlaps.

ADD REPLY
7
Entering edit mode
6.0 years ago
zx8754 12k

Make a query range, then subsetByOverlaps (sorry in comments by mistake mentioned findOverlaps):

q=GRanges(seqnames="chr2",
          ranges=IRanges(start = 200, end = 300),
          strand="-")

subsetByOverlaps(gr, q)
ADD COMMENT
3
Entering edit mode
6.0 years ago
bernatgel ★ 3.4k

If you want to use a simple selector like the one you propose, you should flip the tests for start and end:

gr[seqnames(gr) == "chr2" & start(gr) < 300 & end(gr) > 200 & strand(gr) == "-"]

and that should work.

Using subsetByOverlaps could be more readable and is your best option if you have more than one region.

ADD COMMENT
1
Entering edit mode

Good one, didn't spot there was a mistake in logical comparisons. Even if subsetByOverlaps is better for general case, this is the answer to the "problem". (I will move this to an answer).

ADD REPLY

Login before adding your answer.

Traffic: 1548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6