Question

Computing Relative Coordinates Of One Bioconductor Granges Object To Another?

2

Entering edit mode

12.6 years ago

Ryan Thompson ★ 3.6k

I have a set of mapped reads as a GRanges object in BioConductor, and I am grouping them into "clusters" of overlapping reads (regions of contiguous nonzero coverage). I also have the cluster coordinates as a GRanges object, obtained by calling reduce on an unstranded version of the reads, and then arbitrarily assigning a strand to each cluster (actually I have a method for assigning a specific strand to each, but it's not important).

library(ShortRead)
bamfile <- file.path(system.file("extdata", package="Rsamtools"), "ex1.bam")
read.ranges <- as(readAligned(bamfile, type="BAM"), "GRanges")
cluster.ranges <- reduce(GRanges(seqnames=seqnames(read.ranges),
                                 ranges=ranges(read.ranges),
                                 ## Don't consider strand when merging
                                 ## reads into clusters
                                 strand="*",
                                 seqlengths=seqlengths(read.ranges)))
## Each cluster has a strand, assigned via an unspecified method
strand(cluster.ranges) <- c("+", "-")
## Assign an arbitrary ID to each cluster
elementMetadata(cluster.ranges)$id <- sprintf("cluster%s", seq(length(cluster.ranges)))

So basically, I have two GRanges objects, read.ranges and cluster.ranges, and every range in the first obejct is contained in exactly one range in the second object:

> read.ranges
GRanges with 3242 ranges and 2 elementMetadata cols:
         seqnames       ranges strand   |                       id      flag
            <Rle>    <IRanges>  <Rle>   |             <BStringSet> <integer>
     [1]     seq1     [ 1, 36]      +   |      B7_591:4:96:693:509        73
     [2]     seq1     [ 3, 37]      +   |   EAS54_65:7:152:368:113        73
     [3]     seq1     [ 5, 39]      +   |      EAS51_64:8:5:734:57       137
     [4]     seq1     [ 6, 41]      +   |     B7_591:1:289:587:906       137
     [5]     seq1     [ 9, 43]      +   |    EAS56_59:8:38:671:758       137
     [6]     seq1     [13, 47]      +   |    EAS56_61:6:18:467:281        73
     [7]     seq1     [13, 48]      +   |  EAS114_28:5:296:340:699       137
     [8]     seq1     [15, 49]      +   |     B7_597:6:194:894:408        73
     [9]     seq1     [18, 52]      -   |    EAS188_4:8:12:628:973        89
     ...      ...          ...    ... ...                      ...       ...
  [3234]     seq2 [1520, 1554]      +   |  EAS114_45:6:86:859:1779       137
  [3235]     seq2 [1523, 1555]      -   |   EAS54_71:8:105:854:975        83
  [3236]     seq2 [1524, 1558]      -   |   EAS51_62:4:187:907:145       153
  [3237]     seq2 [1524, 1557]      +   |   EAS54_71:4:284:269:882        73
  [3238]     seq2 [1524, 1558]      +   |     EAS56_63:4:141:9:811       137
 [ reached getOption("max.print") -- omitted 4 rows ]
  ---
  seqlengths:
   seq1 seq2
     NA   NA
> cluster.ranges
GRanges with 2 ranges and 1 elementMetadata col:
      seqnames    ranges strand |          id
         <Rle> <IRanges>  <Rle> | <character>
  [1]     seq1 [1, 1569]      + |    cluster1
  [2]     seq2 [1, 1567]      - |    cluster2
  ---
  seqlengths:
   seq1 seq2
     NA   NA

From this, I would like to transform the coordinates of the reads into coordinates relative to the clusters to which they belong. This includes inverting the strand and reversing coordinates of reads within a cluster whose strand is "-", Is there a function to do this?

bioconductor coordinates • 3.6k views

ADD COMMENT • link updated 12.6 years ago by Malcolm.Cook ★ 1.5k • written 12.6 years ago by Ryan Thompson ★ 3.6k

score 2 · Answer 1 · 2012-04-14

I don't think there is a function to do exactly that.

If I understand the specifics of your strand considerations, the following should be close

# find which cluster each range is within, ignoring strand
hits<-cluster.ranges[findOverlaps(read.ranges,cluster.ranges,select='first',ignore.strand=TRUE)]
# start with the read.ranges themselves
read.ranges.on.hits<-read.ranges
# update each range relative to its cluster hit
ranges(read.ranges.on.hits)<-ranges(narrow(hits,start(read.ranges),end(read.ranges)))
# identify which hits are to clusters on negative strand
neg<-as.vector(strand(hits)=='-')
# reflect the coordinates and the strand of the reads in neg strand clusters
ranges(read.ranges.on.hits)[neg]<-reflect(ranges(read.ranges.on.hits[neg]),ranges(hits[neg]))
strand(read.ranges.on.hits)[neg]<-ifelse('-'==strand(read.ranges.on.hits)[neg],'+','-')

which yields

> read.ranges.on.hits
GRanges with 3242 ranges and 2 elementMetadata values:
         seqnames    ranges strand   |                       id      flag
            <Rle> <IRanges>  <Rle>   |             <BStringSet> <integer>
     [1]     seq1  [ 1, 36]      +   |      B7_591:4:96:693:509        73
     [2]     seq1  [ 3, 37]      +   |   EAS54_65:7:152:368:113        73
     [3]     seq1  [ 5, 39]      +   |      EAS51_64:8:5:734:57       137
     [4]     seq1  [ 6, 41]      +   |     B7_591:1:289:587:906       137
     [5]     seq1  [ 9, 43]      +   |    EAS56_59:8:38:671:758       137
     [6]     seq1  [13, 47]      +   |    EAS56_61:6:18:467:281        73
     [7]     seq1  [13, 48]      +   |  EAS114_28:5:296:340:699       137
     [8]     seq1  [15, 49]      +   |     B7_597:6:194:894:408        73
     [9]     seq1  [18, 52]      -   |    EAS188_4:8:12:628:973        89
     ...      ...       ...    ... ...                      ...       ...
  [3234]     seq2  [14, 48]      -   |  EAS114_45:6:86:859:1779       137
  [3235]     seq2  [13, 45]      +   |   EAS54_71:8:105:854:975        83
  [3236]     seq2  [10, 44]      +   |   EAS51_62:4:187:907:145       153
  [3237]     seq2  [11, 44]      -   |   EAS54_71:4:284:269:882        73
  [3238]     seq2  [10, 44]      -   |     EAS56_63:4:141:9:811       137
  [3239]     seq2  [10, 44]      -   |  EAS114_30:6:277:397:932        73
  [3240]     seq2  [ 6, 40]      +   | EAS139_11:7:50:1229:1313        83
  [3241]     seq2  [ 2, 36]      +   |    EAS54_65:3:320:20:250       147
  [3242]     seq2  [ 1, 35]      +   |    EAS114_26:7:37:79:581        83
  ---
  seqlengths:
   seq1 seq2
     NA   NA