I have a set of mapped reads as a GRanges object in BioConductor, and I am grouping them into "clusters" of overlapping reads (regions of contiguous nonzero coverage). I also have the cluster coordinates as a GRanges object, obtained by calling reduce
on an unstranded version of the reads, and then arbitrarily assigning a strand to each cluster (actually I have a method for assigning a specific strand to each, but it's not important).
library(ShortRead)
bamfile <- file.path(system.file("extdata", package="Rsamtools"), "ex1.bam")
read.ranges <- as(readAligned(bamfile, type="BAM"), "GRanges")
cluster.ranges <- reduce(GRanges(seqnames=seqnames(read.ranges),
ranges=ranges(read.ranges),
## Don't consider strand when merging
## reads into clusters
strand="*",
seqlengths=seqlengths(read.ranges)))
## Each cluster has a strand, assigned via an unspecified method
strand(cluster.ranges) <- c("+", "-")
## Assign an arbitrary ID to each cluster
elementMetadata(cluster.ranges)$id <- sprintf("cluster%s", seq(length(cluster.ranges)))
So basically, I have two GRanges objects, read.ranges
and cluster.ranges
, and every range in the first obejct is contained in exactly one range in the second object:
> read.ranges
GRanges with 3242 ranges and 2 elementMetadata cols:
seqnames ranges strand | id flag
<Rle> <IRanges> <Rle> | <BStringSet> <integer>
[1] seq1 [ 1, 36] + | B7_591:4:96:693:509 73
[2] seq1 [ 3, 37] + | EAS54_65:7:152:368:113 73
[3] seq1 [ 5, 39] + | EAS51_64:8:5:734:57 137
[4] seq1 [ 6, 41] + | B7_591:1:289:587:906 137
[5] seq1 [ 9, 43] + | EAS56_59:8:38:671:758 137
[6] seq1 [13, 47] + | EAS56_61:6:18:467:281 73
[7] seq1 [13, 48] + | EAS114_28:5:296:340:699 137
[8] seq1 [15, 49] + | B7_597:6:194:894:408 73
[9] seq1 [18, 52] - | EAS188_4:8:12:628:973 89
... ... ... ... ... ... ...
[3234] seq2 [1520, 1554] + | EAS114_45:6:86:859:1779 137
[3235] seq2 [1523, 1555] - | EAS54_71:8:105:854:975 83
[3236] seq2 [1524, 1558] - | EAS51_62:4:187:907:145 153
[3237] seq2 [1524, 1557] + | EAS54_71:4:284:269:882 73
[3238] seq2 [1524, 1558] + | EAS56_63:4:141:9:811 137
[ reached getOption("max.print") -- omitted 4 rows ]
---
seqlengths:
seq1 seq2
NA NA
> cluster.ranges
GRanges with 2 ranges and 1 elementMetadata col:
seqnames ranges strand | id
<Rle> <IRanges> <Rle> | <character>
[1] seq1 [1, 1569] + | cluster1
[2] seq2 [1, 1567] - | cluster2
---
seqlengths:
seq1 seq2
NA NA
From this, I would like to transform the coordinates of the reads into coordinates relative to the clusters to which they belong. This includes inverting the strand and reversing coordinates of reads within a cluster whose strand is "-", Is there a function to do this?
Oops, I forgot to mention the part about giving each cluster a unique ID and then using that as the
seqnames
of the new GRanges object, but that's easy to do myself.It looks to me like you already have already given each cluster a unique ID.
To use it as the seqnames of the result, just initialize read.ranges.on.hits instead with:
Oh yeah, I guess I did.
ok - so - do you accept the answer as complete? if so, then click the green arrow! Cheers, malcolm.