Question

cannot understand coverage function in Granges package

1

Entering edit mode

3.6 years ago

alexmondaini ▴ 20

Hello everyone,

I understand the concept behind a run length encoded vector as being just a short representation of a long value repeated vector.

For example :

> rl = Rle(c(1,1,1,1,2,2,2,3,1,1))
> rl
numeric-Rle of length 10 with 4 runs
  Lengths: 4 3 1 2
  Values : 1 2 3 1

means for the 1 value I have a lenght of 4 for the value of 2 I have a length of 3 an so on.

Now if we construct a Granges object:

gr1 <- GRanges(seqnames="chr2", ranges=IRanges(c(3,10), c(6,16)),
               strand="+")

that looks like that:

 gr1
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr2       3-6      +
  [2]     chr2     10-16      +
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

and we ask for the coverage of this object we get:

> coverage(gr1)
RleList of length 1
$chr2
integer-Rle of length 16 with 4 runs
  Lengths: 2 4 3 7
  Values : 0 1 0 1

I honestly struggle to understand the values and lengths here, I have been reading the manual and other tutorials but haven't found a single place which in detail goes one by one on how coverage is computed and how the Rle finally gives these values and lengths. If someone could help me out explaining why the first length is 2 and the first value is 0 in this case would be great. thanks

Granges • 2.0k views

ADD COMMENT • link 3.6 years ago by alexmondaini ▴ 20

score 3 · Answer 1 · 2021-12-04

3

Entering edit mode

3.6 years ago

ATpoint 88k

The coverage() output represents the coverage of the entire chromosome. Since your GRanges object starts at 3 means that the positions 1 and 2 have coverage of zero, therefore the Values 0 of Lengths 2. In your case the "chr2" is of length 16 because 16 is the largest end coordinate you entered. You can set a seqlength though. If you tell gr1 that e.g. chr2 is of length 1000 then this would also be represented in coverage:

> seqlengths(gr1) <- 1000
> coverage(gr1)
RleList of length 1
$chr2
integer-Rle of length 1000 with 5 runs
  Lengths:   2   4   3   7 984
  Values :   0   1   0   1   0

16 is the last base that is coverered, so 1000-16=984, therefore Lengths 984 and Values 0 because all these bases after 16 are uncovered.

Does that make sense to you?

ADD COMMENT • link 3.6 years ago by ATpoint 88k

0

Entering edit mode

oh yes, now it does.This is an absolute coverage of the genome in question right? it definitely makes more sense when you give a seq length to every chromosome. I'm wondering now if it makes sense to use this function to get the coverage of one granges object relative to another granges object. I believe bestools coverage behave this way with several different bed files but here for granges it seems like coverage should be used to compare the Grange object with its entire length.

ADD REPLY • link 3.6 years ago by alexmondaini ▴ 20