Hi, I am performing an exome analysis and am getting a new error when trying to run GATK's BaseRecalibrator. Here is the error message:
SAM/BAM file SAMFileReader{/Volumes/Passport/Bos_001C/Bos001C.realigned.fixed.dedup.bam} is malformed: BAM file has a read with mismatching number of bases and base qualities. Offender: HWI-ST1122:264:C2LCWACXX:1:1208:17696:94881 [101 bases] [0 quals]
I tried adding -filterMBQ and I get a different error message:
<h5>ERROR stack trace</h5>java.lang.ArrayIndexOutOfBoundsException: -29 at org.broadinstitute.sting.utils.baq.BAQ.calcEpsilon(BAQ.java:158) at org.broadinstitute.sting.utils.baq.BAQ.hmm_glocal(BAQ.java:225) at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:542) at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:595) at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:530) at org.broadinstitute.sting.utils.baq.BAQ.baqRead(BAQ.java:663) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateBAQArray(BaseRecalibrator.java:428) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:243) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:112) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:203) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:191) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$MapReduceJob.run(NanoScheduler.java:468) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680)
Any ideas what I should do? Thanks for your help.
Can you do a samtools view input.bam |grep HWI-ST1122:264:C2LCWACXX:1:1208:17696:94881 for fun ?
yes - it's pretty wacky: This is output from a different file that the one I have referenced in my question - but it is the same error message: HWI-ST1122:264:C2LCWACXX:1:2113:17084:65824 99 1 13405 22 101M = 13495 186 CCTCCACCACCCCGAGATCACATTTCTCACTGCCTTTTGTCTGCCCAGTTTCACCAGAAGTAGGCCTCTTCCTGACAGGCAGCTGCACCACTGCCTGCCGC * PG:Z:MarkDuplicates RG:Z:IA084C NM:i:1 MQ:i:22 AS:i:97 XS:i:97 HWI-ST1122:264:C2LCWACXX:1:2113:17084:65824 147 1 13495 22 5S96M = 13405 -186 CTCCTCTGCCTGGCGATGTGCCCGTCCTTTGCTCTGACCGCTGGAGACAGTGTTTGTCATTGGCATGGTCTGCAGGGATCCTGCTACAAAGGTGAAACCCA "
! PG:Z:MarkDuplicates RG:Z:IA084C NM:i:6 MQ:i:22 AS:i:66 XS:i:66
I am guessing that the " and "! shouldn't be there.
Update: I have looked up the id in the fastq files, the initial bam and the bam after realignment around indels (see below). Something is happening during indel realignment. I removed the ID from the original bam and reindexed and realigned around indels. Got the same error in BQSR with another entry. Can anyone see the problem in sample below?
Raw fastq: @HWI-ST1122:264:C2LCWACXX:1:2113:17084:65824 1:N:0:AAACAT CCTCCACCACCCCGAGATCACATTTCTCACTGCCTTTTGTCTGCCCAGTTTCACCAGAAGTAGGCCTCTTCCTGACAGGCAGCTGCACCACTGCCTGCCGC + ??@AAABADADFDE?@@GGE>
@HWI-ST1122:264:C2LCWACXX:1:2113:17084:65824 2:N:0:AAACAT TGGGTTTCACCTTTGTAGCAGGATCCCTGCAGACCATGCCAATGACAAACACTGTCTCCAGCGGTCAGAGCAAAGGACGGGCACATCGCCAGGCAGAGGAG + :==+@+22?22<+++22,@A7++3A7=<=+1**1?A#################################################################
MDF869:samtools-0.1.18 Allison$ ./samtools view /Volumes/Passport/IA_084/IA084C.bam |grep HWI-ST1122:264:C2LCWACXX:1:2113:17084:65824
HWI-ST1122:264:C2LCWACXX:1:2113:17084:65824 99 1 13405 22 101M = 13495 186 CCTCCACCACCCCGAGATCACATTTCTCACTGCCTTTTGTCTGCCCAGTTTCACCAGAAGTAGGCCTCTTCCTGACAGGCAGCTGCACCACTGCCTGCCGC ??@AAABADADFDE?@@GGE>
MDF869:samtools-0.1.18 Allison$ ./samtools view /Volumes/Passport/IA_084/IA084C.realigned.bam |grep HWI-ST1122:264:C2LCWACXX:1:2113:17084:65824 HWI-ST1122:264:C2LCWACXX:1:2113:17084:65824 99 1 13405 22 101M = 13495 186 CCTCCACCACCCCGAGATCACATTTCTCACTGCCTTTTGTCTGCCCAGTTTCACCAGAAGTAGGCCTCTTCCTGACAGGCAGCTGCACCACTGCCTGCCGC * RG:Z:IA084C NM:i:1 MQ:i:22 AS:i:97 XS:i:97 HWI-ST1122:264:C2LCWACXX:1:2113:17084:65824 147 1 13495 22 5S96M = 13405 -186 CTCCTCTGCCTGGCGATGTGCCCGTCCTTTGCTCTGACCGCTGGAGACAGTGTTTGTCATTGGCATGGTCTGCAGGGATCCTGCTACAAAGGTGAAACCCA "
! RG:Z:IA084C NM:i:6 MQ:i:22 AS:i:66 XS:i:66