Entering edit mode
4.2 years ago
rpolicastro
13k
I just started using plyranges, and I can not figure out how to reduce and aggregate a GRanges object with a desired gap width.
Example data.
library("plyranges")
df <- data.frame(
seqnames="chrI", start=c(1, 10, 20), end=c(5, 15, 25), strand=c("+", "+", "-"),
score=c(8, 3, 6)
)
gr <- as_granges(df)
> gr
GRanges object with 3 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <integer>
[1] chrI 1-5 + | 8
[2] chrI 10-15 + | 3
[3] chrI 20-25 - | 6
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
Desired output with max allowed gap width of 10 and summing the scores for the aggregation in this example.
desired_output <- data.frame(
seqnames="chrI", start=c(1, 20), end=c(15, 25), strand=c("+", "-"),
score=c(11, 6)
)
desired_output <- as_granges(desired_output)
> desired_output
GRanges object with 2 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <numeric>
[1] chrI 1-15 + | 11
[2] chrI 20-25 - | 6
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
This is similar to section 4.1 in the HelloRanges tutorial, which does work for me since you can set a minimum gap width in the GenomicRanges::reduce
function. The plyranges equivalent is reduce_ranges_directed
but it does not appear to have a gap width option.
EDIT: This has been cross-posted to bioconductor support also.
Will this do?
EDIT: replaced
flank_right
withstretch
.But I can see how it might be annoying that the result will have aberrant bp added to the final interval