Hello everyone,
I understand the concept behind a run length encoded vector as being just a short representation of a long value repeated vector.
For example :
> rl = Rle(c(1,1,1,1,2,2,2,3,1,1))
> rl
numeric-Rle of length 10 with 4 runs
Lengths: 4 3 1 2
Values : 1 2 3 1
means for the 1
value I have a lenght of 4
for the value of 2
I have a length of 3
an so on.
Now if we construct a Granges
object:
gr1 <- GRanges(seqnames="chr2", ranges=IRanges(c(3,10), c(6,16)),
strand="+")
that looks like that:
gr1
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 3-6 +
[2] chr2 10-16 +
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
and we ask for the coverage
of this object we get:
> coverage(gr1)
RleList of length 1
$chr2
integer-Rle of length 16 with 4 runs
Lengths: 2 4 3 7
Values : 0 1 0 1
I honestly struggle to understand the values and lengths here, I have been reading the manual and other tutorials but haven't found a single place which in detail goes one by one on how coverage is computed and how the Rle
finally gives these values and lengths. If someone could help me out explaining why the first length is 2
and the first value is 0
in this case would be great. thanks
oh yes, now it does.This is an absolute coverage of the genome in question right? it definitely makes more sense when you give a seq length to every chromosome. I'm wondering now if it makes sense to use this function to get the coverage of one granges object relative to another granges object. I believe bestools coverage behave this way with several different bed files but here for granges it seems like coverage should be used to compare the Grange object with its entire length.