Does anyone know how the Y axis (% of reference) for the "coverage distribution" graph of the BAMStats program is calculated ?
1
1
Entering edit mode
10.4 years ago
kay ▴ 380

Hello,

I am using the BAMStats program to calculate the coverage for my BAM file.

I am trying to understand how the Y axis (% of reference) for the "coverage distribution (mapped only)" graph of the BAMStats program is calculated.

If anyone can help me understand, that would be great.

Thanks
Kay

next-gen bam RNA-Seq • 2.7k views
ADD COMMENT
0
Entering edit mode

At what level are you wanting to know this? In other words, do you want to know the mathematical details of the calculation algorithm, or rather the general process for calculating this metric?

ADD REPLY
0
Entering edit mode
10.4 years ago
Dan D 7.4k

I'll go ahead and answer the latter possibility of my comment. A BAM file contains information about precisely where on a reference a particular read has been mapped. Thus, for each base of the reference genome, you can calculate how many sample reads have a base which aligns at that locus.

The number of times that a base is covered by sample reads is the depth of coverage for that base. If a given reference base has 30 reads which have one of their bases mapped to it, then that reference base has 30X coverage. If you then bin these coverage depths you can make a histogram: 5,000 bases have exactly 30X. 5,500 bases have exactly 25X coverage, and so on.

In the case of bamstats, you're going one step further and calculating the percentage of total reference bases which have a given sequencing depth. If your reference genome is 10,000 bases in length, and exactly 100 bases have a depth of coverage of exactly 30X, then 1% of your reference has 30X coverage.

ADD COMMENT

Login before adding your answer.

Traffic: 2126 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6