How to count the number of alignments in a bam file without using samtools?
1
0
Entering edit mode
4.0 years ago
steve ★ 3.5k

Preferably, using either standard GNU tools, or perhaps something in the Python standard library. Any ideas?

bam • 1.3k views
ADD COMMENT
0
Entering edit mode

if the bam is uncompressed, you could juste use wc -l which will return the number of lines (= the number of alignments assuming there is no unaligned entry in the file). But samtools is much better.

edit: I got it wrong, see comments below.

ADD REPLY
1
Entering edit mode

dont think I have ever seen a case where people actually use uncompressed bam file, at that point you might as well be using sam format

ADD REPLY
0
Entering edit mode

It is used when one need to pipe a bam file in another process (saves compression time, as said below by Jorge).

ADD REPLY
0
Entering edit mode

I think by definition, a bam is compressed.

ADD REPLY
0
Entering edit mode

not with samtools view -bu I think

ADD REPLY
1
Entering edit mode

Even if uncompressed, which is recommended for piping purposes in order to save time compressing and decompressing output and input respectively, a bam file is still binary, so wc -l will not work.

ADD REPLY
0
Entering edit mode

You are both right. I tested it to make sure:

samtools view -bu input.bam | wc -l
459910
samtools flagstat  input.bam | head -n 1
27053366 + 0 in total (QC-passed reads + QC-failed reads)
ADD REPLY
0
Entering edit mode

Just a couple of comments:

-u implies -b, therefore samtools view -u input.bam is enough to get an uncompressed bam file.

Also, if you only need to know the number of reads, generating all flagstat metrics is not that efficient, and samtools view -c input.bam would be sufficient. It won't be that faster, since most of the time is spent in decompressing and reading the bam file, but it neither will be slower plus it's simpler to write down.

ADD REPLY
1
Entering edit mode
4.0 years ago
xiaoguang ▴ 160

Bam file is a binary file stroing alignment information, so we must need htslib.h api to interprete that. if you use GNU, you need htslib.h. if you use python, pysam must be the best choice for you.

ADD COMMENT

Login before adding your answer.

Traffic: 1301 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6