Hi,
I have SAM file after alignment and the samtools flagstat produces the following results:
22806328 + 4861610 in total (QC-passed reads + QC-failed reads)
154024 + 1050636 secondary
0 + 0 supplementary
0 + 0 duplicates
22634029 + 4846318 mapped (99.24% : 99.69%)
22652304 + 3810974 paired in sequencing
11326152 + 1905487 read1
11326152 + 1905487 read2
22435338 + 0 properly paired (99.04% : 0.00%)
22467696 + 3780390 with itself and mate mapped
12309 + 15292 singletons (0.05% : 0.40%)
30982 + 232010 with mate mapped to a different chr
13721 + 50746 with mate mapped to a different chr (mapQ>=5)
I then used bam2fastq to extract the unmapped reads:
bam2fastq -o D10_S1_%#_unmapped_reads.fastq -f --no-aligned --unaligned --no-filter D10_S1.bam
However the ouput reveals that only 172299 sequences were exported
27667938 sequences in the BAM file 172299 sequences exported WARNING: 12309 reads could not be matched to a mate and were not exported
This does not correlate with the flagstat results.
Thanks is advance for the help.
Was this the output of BWA-meth? This is a very scary looking BAM file, that's for sure. If you could provide more information about the experiment/sequencing type, that could help a lot in diagnosing the best solution.
Hi John,
Yes it is from bwa-meth. I converted the sam file to bam. This sample is extracted from rust infected wheat. This means that it should contain both the rust and wheat genome. The above flagstat is after the sample was aligned to the rust genome. I am assuming that the 4861610 unmapped reads would aligned to wheat. I wanted to extract these unmapped reads and then align them to the wheat genome to see if it this is true.
Thanks
Ah, i see. I didn't know BWA-meth gave secondary alignments like that. hm.
Well, just so you are aware, the best option is probably to also map to the wheat genome and then for each read check if it aligned to rust, wheat, both, or neither, to get a proper statistic for this sort of thing. Or perhaps there are proper metagenomics tools i don't know about that do this.
Anyway, just for the hell of it, it might be a good idea to generate the same statistics as above from the original SAM file (not the BAM file converted back to SAM), to make sure the SAM to BAM conversion didn't somehow go wrong. Also the flagstats for D10_S1.bam would be interesting.