I received a BAM of one chromosome and need to determine the percent of the genome covered. Our genome size is 16,600 bp and we did paired-end sequencing each reads being 150 bp long. To my knowledge it should be calculated as follow:
- Genome size : 16,670 bp
- No of base covered: 16,639 bp
- Percent of the genome covered = [(No. of base covered) / (Genome size) ] 100 = (16,639/16,670) 100 = 99.81%
I did the following with samtools depth:
samtools depth -q0 -Q0 my.bam > qry.depth
wc -l qry.depth
16639 qry.depth
Though I don't fully understand the output of samtools depth
. To my knowledge the output should be as follow:
- Column 1: ID
- Column 2: Genomic position
- Column 3: No of reads covering that position
- Number of line in qry.depth represents the number of bases covered in the reference
Here is a sample of my qry.depth file:
chrM 1 3
chrM 2 6
chrM 3 16
chrM 4 32
chrM 5 34
chrM 7783 0
chrM 7915 0
The line count of qry.depth
is 16,639 but I still have 2 positions with no reads covering them, then what is the percent of the genome covered? Maybe I misunderstood something in the output file format.
Hi Devon,
Sorry. I cannot understand completely. I also encounter the question:
The reference length: 134962
Line count of *.depth is : 134184
No. of positions have only 0: 26, as below:
My question is: why there are 0? If there are reads to support these positions, why samtools output them?
Thanks!
Normally
-a -a
will output absolutely every position, or at least it should. If there are reads supporting coverage of a positions then they should get counted unless they're above the maximus depth (-d
) or being filtered out (play around with-q
and-Q
).