Corrupted bam file
1
1
Entering edit mode
10.2 years ago

I have a bam file which appears to be corrupted but I can't understand where and why.

The bam and index was generated using this script. It was submitted to LSF via bsub and it completed successfully:

bwa sampe $ref <(bwa aln $ref $fq1) <(bwa aln $ref $fq2) $fq1 $fq2 \
| samtools view -Su - \
| samtools sort - bam/$bname &&
samtools index bam/${bname}.bam

I used the same script for other files which seem to be ok. Also, I repeated the alignment in case some corruption occurred on the disks but it made no difference.

The problem or symptom is that the number of alignments on the first chromosome is inconsistent with what reported in the index:

samtools view -c $bam LmjF.01
96370

However the index file reports 96372 alignments (95350 + 1022):

samtools idxstats $bam
LmjF.01    268988    95350    1022
LmjF.02    355712    139731    1441
LmjF.03    384502    105515    1303
...

I found out something was weird because running picard gave me the follow error:

java -jar ~/bin/CollectMultipleMetrics.jar I=$bam R=$ref O=$outname VALIDATION_STRINGENCY=SILENT

[Wed Aug 20 10:45:07 BST 2014] picard.analysis.CollectMultipleMetrics INPUT=fk041_F5_10_DIP1.bam REFERENCE_SEQUENCE=/lustre/sblab/berald01/reference_data/genomes/leishmania_major/LmjF_v6.1_spike.fa OUTPUT=/lustre/sblab/berald01/projects/20140818_fumi_hmu_pull_down/20140812_miseq/alnSummaryMetrics//fk041_F5_10_DIP1 VALIDATION_STRINGENCY=SILENT    ASSUME_SORTED=true STOP_AFTER=0 PROGRAM=[CollectAlignmentSummaryMetrics, CollectInsertSizeMetrics, QualityScoreDistribution, MeanQualityByCycle] VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Wed Aug 20 10:45:07 BST 2014] Executing as berald01@uk-cri-lcst01.crnet.org on Linux 2.6.18-274.3.1.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.6.0_35-b10; Picard version: 1.115(30b1e546cc4dd80c918e151dbfe46b061e63f315_1402927010) JdkDeflater
WARNING    2014-08-20 10:45:08    SinglePassSamProgram    File reports sort order 'unsorted', assuming it's coordinate sorted anyway.
[Wed Aug 20 10:45:09 BST 2014] picard.analysis.CollectMultipleMetrics done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=252116992
To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.collectQualityData(AlignmentSummaryMetricsCollector.java:331)
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.addRecord(AlignmentSummaryMetricsCollector.java:229)
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:155)
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:127)
    at picard.metrics.MultiLevelCollector$AllReadsDistributor.acceptRecord(MultiLevelCollector.java:192)
    at picard.metrics.MultiLevelCollector.acceptRecord(MultiLevelCollector.java:315)
    at picard.analysis.AlignmentSummaryMetricsCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:71)
    at picard.analysis.CollectAlignmentSummaryMetrics.acceptRead(CollectAlignmentSummaryMetrics.java:114)
    at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:117)
    at picard.analysis.CollectMultipleMetrics.doWork(CollectMultipleMetrics.java:136)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
    at picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:124)
    at picard.analysis.CollectMultipleMetrics.main(CollectMultipleMetrics.java:100)

EDIT: Version info:

Program: samtools (Tools for alignments in the SAM format)
Version: 0.1.18 (r982:295)

Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.8-r455

java -jar ~/bin/CollectMultipleMetrics.jar --version
1.115(30b1e546cc4dd80c918e151dbfe46b061e63f315_1402927010)

Does anybody know where the issue could be?

samtools bwa bam • 5.6k views
ADD COMMENT
0
Entering edit mode

Which version of samtools are you using? I've seen a report of something weird like this on SEQanswers but couldn't reproduce it. If you're using the new htslib-dependent version, did you download directly from github and, if so, did you switch to the specific 1.0 tag releases before compiling htslib and samtools?

ADD REPLY
0
Entering edit mode

Thanks for replying Devon. I edited my post to add versions info. No, I'm not using htslib and the 1.x release (in fact, I discovered its existence now only!)

ADD REPLY
1
Entering edit mode

I wonder if this was a bug that got fixed in a more recent version. 0.1.18 is pretty old, so try either 0.1.19 or the newer 1.0 release.

ADD REPLY
0
Entering edit mode

Yes, I'll update to 1.x and see what happens..., thanks again.

ADD REPLY
2
Entering edit mode
10.2 years ago

To answer my own question... The problem disappears after upgrading to samtools-1.0

Bad news: It seems samtools-1.0 idxstats index incorrectly reports the number of unmapped reads as reported in this thread on SEQanswers

ADD COMMENT
2
Entering edit mode

For those coming to this post and wondering, there's been a bug report filed.

ADD REPLY
1
Entering edit mode

Just to note (I started that thread), the problem is not with samtools 1.0 idxstats, but with samtools 1.0 index. Running samtools 1.0 idxstats on a samtools 0.1.19 index works fine. Running samtools 0.1.19 idxstats on a samtools 1.0 index shows the error.

ADD REPLY
1
Entering edit mode

Thank you for pointing it out. I updated my post. And thanks to @Devon Ryan for reporting the bug.

ADD REPLY
0
Entering edit mode

Ah, I'd missed that in the original thread. I've updated the bug report.

ADD REPLY

Login before adding your answer.

Traffic: 1003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6