Hisat2 reports fewer total aligned reads than samtools view -F 4 my.alignment.bam
, and I would like to try and understand why.
According to my Hisat2 results:
54623054 reads; of these:
54623054 (100.00%) were paired; of these:
30215613 (55.32%) aligned concordantly 0 times
18622638 (34.09%) aligned concordantly exactly 1 time
5784803 (10.59%) aligned concordantly >1 times
----
30215613 pairs aligned concordantly 0 times; of these:
222801 (0.74%) aligned discordantly 1 time
----
29992812 pairs aligned 0 times concordantly or discordantly; of these:
59985624 mates make up the pairs; of these:
39341118 (65.58%) aligned 0 times
17349298 (28.92%) aligned exactly 1 time
3295208 (5.49%) aligned >1 times
63.99% overall alignment rate
a total of 18622638 + 5784803 + 222801+ (17349298 / 2) + ( 3295208 / 2)= 34952495 reads aligned to the target transcript.
The SAM flag value of 4 indicates that the given read failed to map.
When I select reads that do not have the SAM flag "4":
gawk '{F4[$1]}
END {
x=0
for(unique_read_id in F4) {
x=x+1
}
print x
}' <(samtools view -F 4 my.alignment.bam)
I get a different number: 44852131 reads in this case; I count ~10 million more unique fragment ids than Hisat2 has reported. What combination of SAM flags can I use to get a result that matches Hisat2?
I am trying to evaluate the efficiency of ribosomal RNA depletion in an RNA Sequencing experiment. I have aligned my fastQ file against the Rn45s pre-spliced transcript in order to estimate ribosomal RNA abundance. I'm aligning reads with Hisat2, and using samtools and gawk for downstream analysis.
According to the Hisat2 manual, it adheres to the SAM Flags standard. I know that there is something I am missing here, I'm just having a hard time seeing it.
Since all of my reads are paired, I get the same results by calling samtools view -F 8 my.alignment.bam