This is the first time this problem has ever happened for me, after dozens of times of running my exome sequencing mapping, post-processing, and variant calling pipeline. The BAM file that is my output from MarkDuplicates on this single sample is missing mate pairs for many reads apparently. Here is the command line I am using for MarkDuplicates:
java -Xmx4g -jar MarkDuplicates.jar CREATE_INDEX=true INPUT=input.bam OUTPUT=output.bam METRICS_FILE=metrics.txt REMOVE_DUPLICATES=true VALIDATION_STRINGENCY=Lenient
When I check the input BAM file with ValidateSameFile it checks out fine, the output however I get errors about mate not found for paired read. Any ideas? Anyone have this happen to them?
This is the behavior I always had with MarkDuplicates when removing them. It generates orphan reads. These orphan reads are unmapped.
Actually, it happens when you have a pair where only one read is mapped. When this mapped read is tagged as a duplicate, MarkDuplicate leaves its unmapped mate in the BAM.