How come the QC-passed reads from samtools flagstat is different from FASTQ (R1+R2) read count?
2
0
Entering edit mode
7.2 years ago

Hello friends,

Sample1_R1.fastq.gz : 120,183

Sample1_R2.fastq.gz : 120,183

Total R1 and R2 reads : 240,366

Sample1.sorted.dedup.bam statistics:

243828 + 0 in total (QC-passed reads + QC-failed reads)

3462 + 0 secondary

0 + 0 supplementary

32154 + 0 duplicates

73449 + 0 mapped (30.12% : N/A)

240366 + 0 paired in sequencing

120183 + 0 read1

120183 + 0 read2

50608 + 0 properly paired (21.05% : N/A)

62058 + 0 with itself and mate mapped

7929 + 0 singletons (3.30% : N/A)

60 + 0 with mate mapped to a different chr

60 + 0 with mate mapped to a different chr (mapQ>=5)

How come I can get 243,828 as QC-passed reads when I have total reads 240,366 (from R1 and R2)?

samtools sam bam alignment • 3.3k views
ADD COMMENT
3
Entering edit mode

243828-3462=240,366

ADD REPLY
2
Entering edit mode
7.2 years ago
ATpoint 86k

The difference are the secondary alignments. Those are reads that map equally well to more than 1 position in the genome. One alignment is randomly chosen as primary, the others flagged as secondary.

ADD COMMENT
0
Entering edit mode

Thanks. I am looking for alignment percentage, Isn't it this number misleading?

If I want the alignment rate, do I report 30.12% or 24.35% (21.05% + 3.30%)?

ADD REPLY
1
Entering edit mode

30%. The mapping percentage is the percentage of reads in the fastq file that can be assigned to the genome. It does not matter if the reads align once or 100 times, the statement "mapped" is either true or false.

ADD REPLY
1
Entering edit mode

From samtools manual , flag 0x4 is UNMAP (segment unmapped). Which ever is not unmapped ("mapped 0x4 bit not set"- copy/pasted from Samtools manual ), is mapped when samtools (flagstat and idxstat) calculates mapped count.

You can observe the same in following example (-F exclude and -f include and 4 is flag):

$ samtools view -F 4  -c  hcc1395_normal_rep1.cutadapt.bam 
642343

$ samtools view -f 4  -c  hcc1395_normal_rep1.cutadapt.bam 
36404

$ samtools idxstats hcc1395_normal_rep1.cutadapt.bam 
chr22   50818468    642343  29046
*   0   0   7358

642343- mapped; 29046 + 7358 (= 36404) - unmapped

This mapped count is used in calculating the %.

ADD REPLY
1
Entering edit mode
7.2 years ago

You can find the answer in this post

ADD COMMENT

Login before adding your answer.

Traffic: 1877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6