I have a bam file which appears to be corrupted but I can't understand where and why.
The bam and index was generated using this script. It was submitted to LSF via bsub and it completed successfully:
bwa sampe $ref <(bwa aln $ref $fq1) <(bwa aln $ref $fq2) $fq1 $fq2 \
| samtools view -Su - \
| samtools sort - bam/$bname &&
samtools index bam/${bname}.bam
I used the same script for other files which seem to be ok. Also, I repeated the alignment in case some corruption occurred on the disks but it made no difference.
The problem or symptom is that the number of alignments on the first chromosome is inconsistent with what reported in the index:
samtools view -c $bam LmjF.01
96370
However the index file reports 96372 alignments (95350 + 1022):
samtools idxstats $bam
LmjF.01 268988 95350 1022
LmjF.02 355712 139731 1441
LmjF.03 384502 105515 1303
...
I found out something was weird because running picard gave me the follow error:
java -jar ~/bin/CollectMultipleMetrics.jar I=$bam R=$ref O=$outname VALIDATION_STRINGENCY=SILENT
[Wed Aug 20 10:45:07 BST 2014] picard.analysis.CollectMultipleMetrics INPUT=fk041_F5_10_DIP1.bam REFERENCE_SEQUENCE=/lustre/sblab/berald01/reference_data/genomes/leishmania_major/LmjF_v6.1_spike.fa OUTPUT=/lustre/sblab/berald01/projects/20140818_fumi_hmu_pull_down/20140812_miseq/alnSummaryMetrics//fk041_F5_10_DIP1 VALIDATION_STRINGENCY=SILENT ASSUME_SORTED=true STOP_AFTER=0 PROGRAM=[CollectAlignmentSummaryMetrics, CollectInsertSizeMetrics, QualityScoreDistribution, MeanQualityByCycle] VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Wed Aug 20 10:45:07 BST 2014] Executing as berald01@uk-cri-lcst01.crnet.org on Linux 2.6.18-274.3.1.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.6.0_35-b10; Picard version: 1.115(30b1e546cc4dd80c918e151dbfe46b061e63f315_1402927010) JdkDeflater
WARNING 2014-08-20 10:45:08 SinglePassSamProgram File reports sort order 'unsorted', assuming it's coordinate sorted anyway.
[Wed Aug 20 10:45:09 BST 2014] picard.analysis.CollectMultipleMetrics done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=252116992
To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.collectQualityData(AlignmentSummaryMetricsCollector.java:331)
at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.addRecord(AlignmentSummaryMetricsCollector.java:229)
at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:155)
at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:127)
at picard.metrics.MultiLevelCollector$AllReadsDistributor.acceptRecord(MultiLevelCollector.java:192)
at picard.metrics.MultiLevelCollector.acceptRecord(MultiLevelCollector.java:315)
at picard.analysis.AlignmentSummaryMetricsCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:71)
at picard.analysis.CollectAlignmentSummaryMetrics.acceptRead(CollectAlignmentSummaryMetrics.java:114)
at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:117)
at picard.analysis.CollectMultipleMetrics.doWork(CollectMultipleMetrics.java:136)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
at picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:124)
at picard.analysis.CollectMultipleMetrics.main(CollectMultipleMetrics.java:100)
EDIT: Version info:
Program: samtools (Tools for alignments in the SAM format)
Version: 0.1.18 (r982:295)
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.8-r455
java -jar ~/bin/CollectMultipleMetrics.jar --version
1.115(30b1e546cc4dd80c918e151dbfe46b061e63f315_1402927010)
Does anybody know where the issue could be?
Which version of samtools are you using? I've seen a report of something weird like this on SEQanswers but couldn't reproduce it. If you're using the new htslib-dependent version, did you download directly from github and, if so, did you switch to the specific 1.0 tag releases before compiling htslib and samtools?
Thanks for replying Devon. I edited my post to add versions info. No, I'm not using htslib and the 1.x release (in fact, I discovered its existence now only!)
I wonder if this was a bug that got fixed in a more recent version. 0.1.18 is pretty old, so try either 0.1.19 or the newer 1.0 release.
Yes, I'll update to 1.x and see what happens..., thanks again.