I used GenomicRanges function in R and I got the gene range that I am looking for it. The next step is to calculate the total number of base pairs in this specific range. I used coverage function and It gave me long results. I hope if any one can help me with loop function to get the total number of bp in this range.
I used the simple example in "An Introduction to Genomic Ranges Classes" I this example, but I didn't understand the results of the coverage function as shown below:
This is the part I don't understand:
I mean what is the interpretation of
Ah, the question is essentially, "What the heck is 'run length encoding' anyway?", since that's what's output. Let's just take chr1 as an example. It has 3 entries and covers 12 bases. We can just draw the entries as follows:
The left-most position is 1 and the right-most 12. So there are 4 bases (1-4) covered by only the first entry. Another way to state this is that you have a run of length 4 (4 bases) with value (coverage) 1. The 5th base above is covered by the 1st and second entries. This creates a run of length 1 (1 base) with value (coverage) 2. The 6th through 7th bases are covered by all three entries, so you have a run of length 2 and value 3. Entry 1 has now ended and bases 8 through 11 are covered by entries 2 and 3, creating a run of length 4 and value 2. The last base, 12, is only covered by entry 3, so it's a run of length 1 and value 1.
In short, if you were to graph the number of reads covering each position, a run length encoding provides a compact way of storing the coverage and a convenient way to then graph it. "Lengths" would then increment position on the X axis and the "Values" would state where to draw things on the Y axis.
BTW, if you instead wanted to know the total bases per chromosome covered by at least one entry, then
sapply(split(gr, seqnames(gr)), function(x) sum(width(reduce(x, ignore.strand=T))))
will work. I'll leave unpacking exactly what that does as a lesson for you :)Thanks a lot Devon for your help.