Entering edit mode
5.4 years ago
gundalav
▴
380
I have the following GenomicRanges object created with this:
library(GenomicRanges)
gr <- GRanges(seqnames = "chr1", strand = c("+", "-","-", "+"),ranges = IRanges(start = c(1,3,3,5), width = 3))
gr
GRanges object with 4 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 1-3 +
[2] chr1 3-5 -
[3] chr1 3-5 -
[4] chr1 5-7 +
What I want to do is to obtain the unique rows from there, yielding this (hand-coded)
GRanges object with 3 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 1-3 +
[2] chr1 3-5 -
[3] chr1 5-7 +
How can I achieve that? In reality, I have around 9 million rows to process.
I can use this method but very slow:
library(tidyverse)
gr %>%
as.tibble() %>%
distinct()
Just be aware that unique() will ignore the data in the GRanges mcols
Is there a way to do this that considers the metadata??