Entering edit mode
4.0 years ago
steve
★
3.5k
Preferably, using either standard GNU tools, or perhaps something in the Python standard library. Any ideas?
Preferably, using either standard GNU tools, or perhaps something in the Python standard library. Any ideas?
Bam file is a binary file stroing alignment information, so we must need htslib.h api to interprete that. if you use GNU, you need htslib.h. if you use python, pysam must be the best choice for you.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
if the bam is uncompressed, you could juste useBut samtools is much better.wc -l
which will return the number of lines (= the number of alignments assuming there is no unaligned entry in the file).edit: I got it wrong, see comments below.
dont think I have ever seen a case where people actually use uncompressed bam file, at that point you might as well be using sam format
It is used when one need to pipe a bam file in another process (saves compression time, as said below by Jorge).
I think by definition, a bam is compressed.
not with
samtools view -bu
I thinkEven if uncompressed, which is recommended for piping purposes in order to save time compressing and decompressing output and input respectively, a bam file is still binary, so
wc -l
will not work.You are both right. I tested it to make sure:
Just a couple of comments:
-u
implies-b
, thereforesamtools view -u input.bam
is enough to get an uncompressed bam file.Also, if you only need to know the number of reads, generating all flagstat metrics is not that efficient, and
samtools view -c input.bam
would be sufficient. It won't be that faster, since most of the time is spent in decompressing and reading the bam file, but it neither will be slower plus it's simpler to write down.