A quick question: I want to count the total aligned bases in a BAM file. I try to apply pysam (samtools) and pybedtools (bedtools) to do that. How can I make it? Thanks.
EDIT:
I have solved this question with pybedtools
and pandas
:
genomecov_base = bd.BedTool.genome_coverage(bam, dz=True) ## Depth per base on genome
gencov_perbase = pan.read_table(genomecov_base.fn, names=['chrom', 'position', 'depth'])
gencov_allbase = (gencov_perbase.iloc[:, 2]).sum() ## Total bases mapped on genome
Hi, this sounds like something I need as well. Does this count the actual number of reads that contain a particular base or just how many reads aligned at that position? For example at position 1 if I know I have 100 reads aligned there, would it tell me I have 50 A's, 25 a, 25 C? Something like that? Or does it just say that there are 100 bases aligned there? Cheers :)
Hi DNAngel, it does not return the per base output, but one option I would recommend for such a calculation is sambamba depth. This will return the per base output at each, although the files become quite large.