Hi, i was trying to extract unmapped reads from a BAM alignment, using following commands
1) samtools view -f 4 bamfile >temp1
2) samtools view -f 8 bamfiles >temp2
3) samtools merge -u - temp[12] | samtools sort -n > unmappedBAM
But when i checked the unmappedBAM files using samtools flagstat, i get 27% mapped reads in it, which are singletons.
1578683 + 0 in total (QC-passed reads + QC-failed reads)
28683 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
428595 + 0 mapped (27.15% : N/A)
1550000 + 0 paired in sequencing
775000 + 0 read1
775000 + 0 read2
0 + 0 properly paired (0.00% : N/A)
0 + 0 with itself and mate mapped
399912 + 0 singletons (25.80% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
I would like to know if singletons are to be kept in the unmappedBAM file or should i remove these as well. I plan to re assemble the unmapped genes, so in that regard I seek suggestion.
So i tried to remove singletons by only extracting mapped reads
i.e samtools view -f 4 bamfile > unmappedbam
But now when converting to fastq reads using bamToFastq, i get warning message
WARNING: Query E00582:535:HGVFLCCX2:3:2116:29924:13808 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping. **WARNING: Query E00582:535:HGVFLCCX2:3:2116:29924:20454 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.*
So now im again left with the dilemma .. Can anyone suggest something in this regard ????