Entering edit mode
8 months ago
Bastien Hervé
5.9k
Hello,
I would like not to reinvent to wheel, I have a sorted grange of overlapping transcript positions (names
). My goal is to aggregate to names
for the positions they are overlapping on, while keeping the positions unique to a single transcript.
With an example :
gr <- GRanges(
seqnames = c("chr1", "chr1", "chr1", "chr1"),
ranges = IRanges(start = c(50, 75, 80, 85),
end = c(110, 90, 110, 120)),
names = c("id1", "id2", "id3", "id4")
)
The expected output would be something like :
gr_output <- GRanges(
seqnames = c("chr1", "chr1", "chr1", "chr1","chr1", "chr1"),
ranges = IRanges(start = c(50, 75, 80, 85, 91, 111),
end = c(74, 79, 84, 90, 110, 120)),
names = c("id1", "id1;id2", "id1;id2;id3", "id1;id2;id3;id4", "id1;id3;id4", "id4")
)
Maybe something with findOverlaps
, reduce
and aggregate
, or summarise
? Or maybe another tool like bedtools
?
Thanks, you nailed it perfectly !