Thorough Documentation Of Samtools Flagstat
3
0
Entering edit mode
11.1 years ago

I am trying to find thoroughfull documentation of samtools flagstat for every row for file Un.bam

samtools flagstat Un.bam
8444 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
8154 + 0 mapped (96.57%:nan%)
8444 + 0 paired in sequencing
4383 + 0 read1
4061 + 0 read2
5973 + 0 properly paired (70.74%:nan%)
7793 + 0 with itself and mate mapped
361 + 0 singletons (4.28%:nan%)
1428 + 0 with mate mapped to a different chr
337 + 0 with mate mapped to a different chr (mapQ>=5)

samtools view -F 4 -q 1 -f 2 -b Un.bam >filtered.bam
samtools flagstat filtered.bam 
4613 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
4613 + 0 mapped (100.00%:nan%)
4613 + 0 paired in sequencing
2330 + 0 read1
2283 + 0 read2
4613 + 0 properly paired (100.00%:nan%)
4613 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:nan%)
61 + 0 with mate mapped to a different chr
53 + 0 with mate mapped to a different chr (mapQ>=5)

EDIT: without demanding mapQ>1

samtools view -F 4 -f 2 -b Un.bam >filtered.bam
[mqm5775@hammer22 bam]$ samtools flagstat filtered.bam 
5973 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
5973 + 0 mapped (100.00%:nan%)
5973 + 0 paired in sequencing
3010 + 0 read1
2963 + 0 read2
5973 + 0 properly paired (100.00%:nan%)
5973 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:nan%)
207 + 0 with mate mapped to a different chr
53 + 0 with mate mapped to a different chr (mapQ>=5)

Since I asked for reads mapped in proper pair I would expect read1 and read2 numbers to be the same, but maybe I am just misunderstanding what these rows mean. Could someone please point me to the documentation or explain this issue? Thanks a lot.

samtools • 6.5k views
ADD COMMENT
2
Entering edit mode
11.1 years ago

As far as I know the flagstat command is a convenience function that produces the same values you would get by selecting the right filters on a samtools view command.

For example properly paired means:

 samtools view -c -f 3 data.bam

then itself and mate mapped is:

 samtools view -c -F 12 data.bam

etc

Is it possible that this alignment contains both paired and unpaired reads? It seems that even your original file has more of read1 than read2?

samtools view -c -f 64 data.bam
samtools view -c -f 128 data.bam

should produce the same numbers that match the flagstat output.

ADD COMMENT
1
Entering edit mode
11.1 years ago
BruceB ▴ 340

The -q 1 part excludes reads with a mapping quality of less than 1 (default value is 0). This may be excluding some reads, so try it again without that part.

I have tested this on several of my BAM files and they all come out as having equal numbers of read 1 and read 2, as you would expect.

ADD COMMENT
0
Entering edit mode

I rerun flagstat without demand for mapq >=1 and values still differ. Should I assume that something is already wrong with my Un.bam file? If yes, can you guess what it could be? :)

ADD REPLY
0
Entering edit mode

Could you give us more details about how the BAM file has been generated.

ADD REPLY
0
Entering edit mode

I was bwa mem with paired-end reads with -M option (Mark shorter split hits as secondary (for Picard compatibility).)

ADD REPLY
1
Entering edit mode
11.1 years ago

If you got this bam file from bowtie2, and if you choose the option --no-mixed while aligning the data, you would end up only with reads that have corresponding mates. Hence you can expect exact pair of reads. Its better if you give some details about your alignment process.

ADD COMMENT
0
Entering edit mode

You are right, I used bwa mem for my alignment. I now think that these unequal numbers for read1 and read2 may be result of split alignment. I will rerun it again to confirm this. I used -M option (Mark shorter split hits as secondary (for Picard compatibility)).

ADD REPLY

Login before adding your answer.

Traffic: 2249 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6