Hi,
I've got a problem with specific bam files, I've downloaded only a slice of them (one chromosome), this results in some MATE_NOT_FOUND
error when checking my bams with:
gatk ValidateSamFile -I input.bam -M SUMMARY
Giving the error:
Error Type Count
ERROR:MATE_NOT_FOUND 1408
ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 18551
ERROR:MISMATCH_FLAG_MATE_UNMAPPED 733
I have tried two solutions, both decreases the number of MATE_NOT_FOUND
but there are still a few remaining, I don't understand how it is possible.
1. I've tried with samtools:
samtools view input.bam -f 0x1 -f 0x2 -b -o input.fixed.bam
It should only keep "read paired (0x1)" and "read mapped in proper pair (0x2)".
Now I have this output:
Error Type Count
ERROR:MATE_NOT_FOUND 81
ERROR:MISMATCH_FLAG_MATE_UNMAPPED 524
2. I've tried with PrintReads (GATK):
gatk PrintReads -I input.bam -o input.fixed.bam --read-filter PairedReadFilter
It should only keep reads that are properly paired:
Now I have this output:
Error Type Count
ERROR:MATE_NOT_FOUND 81
ERROR:MISMATCH_FLAG_MATE_UNMAPPED 524
Why they are not all removed?
I came across this same problem with a BAM file mapped to some unknown reference version and did something similar to @lacb. I followed this blog post from Heng Li using samtools 1.18 and bwa 0.7.17-r1188. This was also referenced in Realigning BAM files to new reference: