I have a bam file that returns the following when I run samtools flagstat
:
37353420 + 61518 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
9068558 + 0 duplicates
29533570 + 48760 mapped (79.07% : 79.26%)
37353420 + 61518 paired in sequencing
18676710 + 30759 read1
18676710 + 30759 read2
25268546 + 41600 properly paired (67.65% : 67.62%)
27671498 + 45600 with itself and mate mapped
1862072 + 3160 singletons (4.99% : 5.14%)
1184710 + 1916 with mate mapped to a different chr
1023122 + 1644 with mate mapped to a different chr (mapQ>=5)
However, when I run samtools fastq
to extract these reads in FASTQ format, most of the reads are discarded as singletons, why?
$ samtools fastq -1 reads.1.fq -2 reads.2.fq -s unpaired.fq --reference genome.fna input.sorted.bam
[M::bam2fq_mainloop] discarded 36407186 singletons
[M::bam2fq_mainloop] processed 37414938 reads
is the sorted bam file sorted on name or position?