why HISAT2 and BWA BAM files differ in number of entries?
1
0
Entering edit mode
6.5 years ago

Dear all,

I have aligned the same input fastq files with both BWA and HISAT2 with the commands, respectively:

bwa mem -t 10 <ref> <file_1.fq.gz> <file_2.fq.gz> -o <bwa_aln.sam>
hisat2 -p 10 -q -x <ref> -1 <file_1.fq.gz>  -2 <file_2.fq.gz> -S <hst_aln.sam>

then converted to BAM with:

samtools view -Sb <xxx_aln.sam> > <xxx_aln.bam>

I then compared the header and counted the number of reads in each BAM file with:

$ samtools view -H <bwa_aln.bam>
@SQ SN:21   LN:46709983
@PG ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem -t 10 ./ref/GRCh38-21.fa 501N-1_1.fq.gz 501N-1_2.fq.gz -o aln/501N_bwa.sam
$ samtools view -H <hst_aln.bam>
@HD VN:1.0  SO:unsorted
@SQ SN:21   LN:46709983
@PG ID:hisat2   PN:hisat2   VN:2.1.0    CL:"/home/gigiux/src/hisat2/hisat2-align-s --wrapper basic-0 -p 10 -q -x ./ref/GRCh38-21.fa -S aln/501N_hst.sam -1 /tmp/9196.inpipe1 -2 /tmp/9196.inpipe2"

and

$ samtools view -c 501N_bwa.bam 
379141213
$ samtools view -c 501N_hst.bam 
384279994

This is unexpected: shouldn't both files have the same number of reads? Any reasons why BWA shows 5 138 781 reads less than HISAT2?

PS: also BWA's header misses the HD flag...

Thank you

Assembly alignment sequencing • 1.8k views
ADD COMMENT
1
Entering edit mode

In addition to what Wouter suggested, it's good to understand the HISAT alignment summary. Check my earlier post here

ADD REPLY
0
Entering edit mode

thank you, this was also interesting

ADD REPLY
2
Entering edit mode
6.5 years ago

Reads can get mapped to multiple locations and will then get counted more than once.

ADD COMMENT
0
Entering edit mode

fair enough, thank you

ADD REPLY

Login before adding your answer.

Traffic: 1568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6