GRanges intersect operation
2
0
Entering edit mode
8.4 years ago
bsmith030465 ▴ 240

I am trying to find the intersection of two GRanges objects. The first object has the coordinates for genes (grgenes below), the second mimics my data (grtest below). However, when I try to do the intersection, I appear to get an empty set. What am I doing wrong? (Sorry for the bad formatting! Can I turn the 'auto-format' off??)

grgenes

GRanges object with 23056 ranges and 1 metadata column:

    seqnames                 ranges strand   |       GENEID
       <Rle>              <IRanges>  <Rle>   | <FactorList>
  1    chr19 [ 58858172,  58874214]      -   |            1
 10     chr8 [ 18248755,  18258723]      +   |           10
100    chr20 [ 43248163,  43280376]      -   |          100

seqinfo: 93 sequences (1 circular) from hg19 genome

grtest <- GRanges(seqnames = "chr19",ranges=IRanges(start=c(58857500,58857505),width=1))

grtest

GRanges object with 2 ranges and 0 metadata columns:

  seqnames               ranges strand
     <Rle>            <IRanges>  <Rle>

[1] chr19 [58857500, 58857500] *

[2] chr19 [58857505, 58857505] *


seqinfo: 1 sequence from an unspecified genome; no seqlengths

intersect(grgenes,grtest)

GRanges object with 0 ranges and 0 metadata columns:

seqnames ranges strand <rle> <iranges> <rle>


seqinfo: 93 sequences (1 circular) from hg19 genome

GRanges • 17k views
ADD COMMENT
3
Entering edit mode
8.4 years ago
russhh 5.7k

Since grgene contains strand information, you need to either add ignore.strand = TRUE , or set the strandedness of your test data. You also need to ensure that your test data actually overlaps with your intiial data ([58857500, ..] doesn't overlap [58858172, 58874214]). I've added a couple of example GRanges that should definitely overlap your grgene ranges in the following:

grgene <- GRanges(
    seqnames = c('chr19', 'chr8', 'chr20'),
    ranges = IRanges(start = c(58858172, 18248755, 43248163),
                       end = c(58874214, 18258723, 43280376)),
    strand = c('-', '+', '-')
    ) # ignoring the GENEID column

grtest.unstrand <- GRanges(
    seqnames = "chr19",
    ranges=IRanges(
        start=c(58857500,58857505,58858500,58859505), # note the difference
        width=1))

grtest.strand <- GRanges(
    seqnames = "chr19",
    ranges=IRanges(
        start=c(58857500,58857505,58858500,58859505),
        width=1),
    strand = c('-', '+', '-', '+'))

> GenomicRanges::intersect(grgene, grtest.unstrand)
GRanges object with 0 ranges and 0 metadata columns:
   seqnames    ranges strand
      <Rle> <IRanges>  <Rle>
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

> GenomicRanges::intersect(grgene, grtest.unstrand, ignore.strand = TRUE)
GRanges object with 2 ranges and 0 metadata columns:
      seqnames               ranges strand
         <Rle>            <IRanges>  <Rle>
  [1]    chr19 [58858500, 58858500]      *
  [2]    chr19 [58859505, 58859505]      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

> GenomicRanges::intersect(grgene, grtest.strand)
GRanges object with 1 range and 0 metadata columns:
      seqnames               ranges strand
         <Rle>            <IRanges>  <Rle>
  [1]    chr19 [58858500, 58858500]      -
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

> GenomicRanges::intersect(grgene, grtest.strand, ignore.strand = TRUE)
GRanges object with 2 ranges and 0 metadata columns:
      seqnames               ranges strand
         <Rle>            <IRanges>  <Rle>
  [1]    chr19 [58858500, 58858500]      *
  [2]    chr19 [58859505, 58859505]      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

HTH

ADD COMMENT
0
Entering edit mode
8.4 years ago
bsmith030465 ▴ 240

Thanks!

Also, how can I get the meta-data values to show in the intersect results. For example, with my original grgenes object:

intersect(grgenes,grtest.unstrand,ignore.strand=TRUE)

GRanges object with 2 ranges and 0 metadata columns:

  seqnames               ranges strand
     <Rle>            <IRanges>  <Rle>

[1] chr19 [58858500, 58858500] *

[2] chr19 [58859505, 58859505] *


seqinfo: 93 sequences (1 circular) from hg19 genome

How can I get the gene ids to show in the meta-data column?

ADD COMMENT
1
Entering edit mode

Unfortunately, that won't be possible in GenomicRanges::intersect, since the metadata for a supersequence doesn't necessarily correspond to a subsequence. You can however, use findOverlaps to identify rows within your grgenes object that are overlapped by at least one entry of grtest, and then extract those rows. Look at the documentation

ADD REPLY

Login before adding your answer.

Traffic: 3000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6