Hello,
Using picard CollectAlignmentSummaryMetrics I get the error No real operator (M|I|D|N) in CIGAR
. I guess this happens when an operator other than M, I, D, N is encountered (and in fact I have soft clipped reads). I can override the error by setting VALIDATION_STRINGENCY=SILENT
.
If my guess is correct, I would like to know why CollectAlignmentSummaryMetrics/picard is set to throw an error with operators other than M, I, D, N.
Thanks!
If relevant here's the offending output:
java -jar -Xmx2g ~/applications/picard/picard-tools-1.92/CollectAlignmentSummaryMetrics.jar \
> IS_BISULFITE_SEQUENCED=True \
> INPUT=$bam \
> OUTPUT=${bam%.bam}.AlnSmryMetr.txt \
> REFERENCE_SEQUENCE=$ref
...
Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Read name M00886:11:000000000-A88VV:1:1109:11516:16954, No real operator (M|I|D|N) in CIGAR
at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
at net.sf.samtools.BAMRecord.getCigar(BAMRecord.java:247)
at net.sf.samtools.SAMRecord.getAlignmentEnd(SAMRecord.java:456)
at net.sf.samtools.SAMRecord.computeIndexingBin(SAMRecord.java:1234)
at net.sf.samtools.SAMRecord.isValid(SAMRecord.java:1644)
at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:540)
at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522)
at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:672)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:650)
at net.sf.picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:109)
at net.sf.picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:55)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
at net.sf.picard.analysis.CollectAlignmentSummaryMetrics.main(CollectAlignmentSummaryMetrics.java:92)
And this is the problematic read:
samtools view $bam | grep 'M00886:11:000000000-A88VV:1:1109:11516:16954'
M00886:11:000000000-A88VV:1:1109:11516:16954 83 chr10 3012200 33 49M19S = 3012200 -49 TAAACAAAATTATAACAAACATCAAACTCTAAATTTAAATAAAAGACCTACAAAAAACATACACTAAA FGGGFGGGGGGGGFGGGGGGFCGGGGGGGGGGGGGGGGGFGFFGGGGGFGGGGFGGGGGGGGECCCCC NM:i:0 MD:Z:49 AS:i:49 XS:i:46 RG:Z:grm029_pb_DALIHP.140422.DALIHPplas1_S1_L001_R_001_val_ YC:Z:CT YD:Z:r
M00886:11:000000000-A88VV:1:1109:11516:16954 163 chr10 3012200 33 68S = 3012200 49 TAAACAAAATTATAACAAACATCAAACTCTAAATTTAAATAAAAGACCTACAAAAAACATACACTAAA 66ACCGGCFGEGGFGFGFGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGG AS:i:49 MD:Z:49 NM:i:0 RG:Z:grm029_pb_DALIHP.140422.DALIHPplas1_S1_L001_R_001_val_ XS:i:46 YC:Z:GA YD:Z:r
Thanks! Of course! I was mislead by the error message. Completely soft clipped reads are the result of clipping overlapping pairs. If overlap is complete one of the two pairs is essentially ignored.