Entering edit mode
6 weeks ago
Smilesky
•
0
I have a set of bam files . I wanted to know the number of exome bases are covered by the bam file. As the human exome size is ~30Mb. I wanted to know the exact number.
I did use samtools depth
3111433021 - and this was what i got. The size is too big and i am confused
Is the
hg38.bed
file only the exonic regions (i.e.exon
as the feature in the GTF file) ? Also, when you say it is too big, what do you mean specifically? Mentioning what you would expect might help us understand where the confusion is.The bed file contains only the exonic regions. I mean the counts of exon bases i got 3111433021 using samtools depth is quite big. If it should be approximately equal to 30MB the number i got is quite huge.
samtools depth
calculates the depth (i.e. number of reads) in the regions defined in the BED file, which could result in a large number. But I am thinking this might not be what you are going for?Would I be correct in guessing that you are you trying to check how many bases of the exome have (at least 1) read covering it?
i want to count the number of exonic bases in the bam file
If you only had exactly one read covering each base then yes. You will/must have multiple reads present over each region (that is where the
depth
part comes in) so getting a large number of bases is not surprising.You may wish to look for inverse i.e. are there any bases that are not covered by at least one read.