I'm trying to complement my alignment pipeline with picard dedup as recommended by Broad (see their best practices). I have paired end Illumina reads and I use only reads from chr17 (mapped previsouly) as test data. After, bwa I run picard MarkDuplicates and get the following error
Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 630636, Read name HWI-H212:69:C0NR3ACXX:1:1212:10100:13262, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned
at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:541)
at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522)
at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:672)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:650)
at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:397)
at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:161)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:145)
I have tried running FixMateInformation, as suggested in SeqAnswers but then the error is
Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 630636, Read name HWI-H212:69:C0NR3ACXX:1:1212:10100:13262, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned
at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:541)
at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522)
at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:672)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:650)
at net.sf.picard.sam.FixMateInformation.doWork(FixMateInformation.java:148)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
at net.sf.picard.sam.FixMateInformation.main(FixMateInformation.java:76)
Finally, ValidateSamFile return the following error for number of reads
ERROR: Record 43432, Read name HWI-H212:69:C0NR3ACXX:1:1310:18404:48164, Mate negative strand flag does not match read negative strand flag of mate
Any suggestions?
I just noticed in the SAM format specification v1.4-r985, there are two pieces of code described that calculate a read's bin index number based on its position in the alignment. If you were really interested in fixing the bug instead of using VALIDATION_STRINGENCY to compare it, maybe you could compare the bin field in your record to what it should be according to the SAM specification?