Picard alignment statistics error
1
0
Entering edit mode
8.8 years ago
mia ▴ 70

Hi,

I am running Picard to get alignment stats on my bam file. However, I am getting the following error

[Fri Mar 04 10:54:28 EST 2016] picard.analysis.CollectAlignmentSummaryMetrics ADAPTER_SEQUENCE=[] REFERENCE_SEQUENCE=/mnt/twin/Rust/puccinia_striiformis_pst_78_1_transcripts.fasta INPUT=/mnt/twin/Rust/SWS484SPF_PBJ_ScaffoldsF500_compare_to_PST_sorted.bam OUTPUT=output.txt    MAX_INSERT_SIZE=100000 EXPECTED_PAIR_ORIENTATIONS=[FR] METRIC_ACCUMULATION_LEVEL=[ALL_READS] IS_BISULFITE_SEQUENCED=false ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Fri Mar 04 10:54:28 EST 2016] Executing as surendraa@surendraa-HP-Z640-Workstation on Linux 3.19.0-25-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_74-b02; Picard version: 2.1.0(25ebc07f7fbaa7c1a4a8e6c130c88c1d10681802_1454776546) IntelDeflater
[Fri Mar 04 10:54:28 EST 2016] picard.analysis.CollectAlignmentSummaryMetrics done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=251658240
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.collectQualityData(AlignmentSummaryMetricsCollector.java:330)
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.addRecord(AlignmentSummaryMetricsCollector.java:195)
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:127)
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:93)
    at picard.metrics.MultiLevelCollector$AllReadsDistributor.acceptRecord(MultiLevelCollector.java:192)
    at picard.metrics.MultiLevelCollector.acceptRecord(MultiLevelCollector.java:315)
    at picard.analysis.AlignmentSummaryMetricsCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:89)
    at picard.analysis.CollectAlignmentSummaryMetrics.acceptRead(CollectAlignmentSummaryMetrics.java:143)
    at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:138)
    at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:77)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

The command I am running is

java -jar picard.jar CollectAlignmentSummaryMetrics R="/mnt/twin/Rust/puccinia_striiformis_pst_78_1_transcripts.fasta" I=/mnt/twin/Rust/SWS484SPF_PBJ_ScaffoldsF500_compare_to_PST_sorted.bam O=output.txt

Any help you can provide would be greatly appreciated, Ann

picard LAST samtools alignment software error • 4.4k views
ADD COMMENT
0
Entering edit mode

It looks like the quality scores in your BAM are out of the range of what the tool can handle. Can you please paste the output of the following command?

samtools view /mnt/twin/Rust/SWS484SPF_PBJ_ScaffoldsF500_compare_to_PST_sorted.bam | head

I want to take a look at the quality score strings in your BAM.

ADD REPLY
0
Entering edit mode

I had the same error, using samtools view mysorted.bam | head | cut -f 5

the quality result

60 60 60 60 0 60 60 60 60 39

any clue?

ADD REPLY
0
Entering edit mode

Those are the MAPQ scores for the read. I'm suggesting that the ASCII-encoded Phred quality scores are out of range. I'm basing that on the source, specifically the collectQualityData method referenced in the trace.

Would you mind posting the output of your command, minus the cut?

ADD REPLY
0
Entering edit mode
1       4       *       0       0       *       *       0       0       AAAAACCCGCCGAAGCGGGTTTTT        *       AS:i:0  XS:i:0  
3       4       *       0       0       *       *       0       0       AAAAATTGCCTGATGCGCTACGCT        *       AS:i:0  XS:i:0  
1455    0       Chromosome      2295796 0       32M     *       0       0       CCAAGCCGGTTGCCTGATGCGACGCTGGCGCG        *       NM:i:0  MD:Z:32 AS:i:32 XS:i:32 XA:Z:Chromosome,+2295570,32M,0;Chromosome,+2295457,32M,1;Chromosome,+2295683,
32M,1;Chromosome,+2295909,32M,1;Chromosome,+2295344,32M,1;  
1457    0       Chromosome      932384  0       32M     *       0       0       CAATATCAGCAGCCGCAACAACCGGTTGCGCC        *       NM:i:0  MD:Z:32 AS:i:32 XS:i:32 XA:Z:Chromosome,+932423,32M,0;  
1459    0       Chromosome      3473808 0       32M     *       0       0       CCCTAACCCTCTCCCCAAAGGGGCGAGGGGAC        *       NM:i:0  MD:Z:32 AS:i:32 XS:i:32 XA:Z:Chromosome,-4042103,32M,0;Chromosome,-621409,32M,0;Chromosome,+3030152,3
2M,0;Chromosome,+3359441,32M,1;
ADD REPLY
0
Entering edit mode

Hmmm, there are no ASCII-encoded per-base quality scores in the result. I wonder if that's the problem. Here's what a read with those present would look like:

ST-E00273:368:H57THCCXY:5:1106:4300:58268       99      chr1    9997    0       151M    =       10329   430     ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCCAACCCTAACCCCAACCCCAACCCTAAC       AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJ<FJJFJ-FAJJJ-<7FJJ7<-AJJA-<FJJ7<JJJJ7-7FFJ-7-AFF-77FJAAAJJJ-A7FJJ)<<JJJ-77FFJ)7<<JJ)7)7AA-7--    RG:Z:RG0        AS:i:128        XS:i:134   NM:i:7

Two questions:

-Does your job fail immediately?

-What aligner are you using?

ADD REPLY
0
Entering edit mode

the job fail immediately, aligner bwa-mem, the aligned reads were fasta reads no quality associated with it (actually it is contigs)

ADD REPLY
1
Entering edit mode
7.2 years ago
Dan D 7.4k

I think the most likely explanation for the failure is the asterisk in place of the quality scores, which is a result of feeding FASTA data into the upstream alignment process. This is based on @Medhat 's information, the name of the BAM in @mia 's listed command which contains the word "scaffold", and the information in the stack trace combined with examination of the source.

I did some digging around and found some more examples to support this notion. Based on the discussions in the link, it seems like a way to make this tool work with your data is to generate a copy of the BAM using the "PrintReads" tool with the --default-base-qualities parameter. This will insert dummy per-base quality scores and allow the tool to execute. Of course, you shouldn't trust any metric which relates to per-base quality, but everything else should be legit.

ADD COMMENT
1
Entering edit mode

based on this answer: java -jar ~/source/GenomeAnalysisTK.jar -T PrintReads -R genome.fa -I contig_sorted_RG.bam -o contig_sorted_rg_fake_quality.bam --defaultBaseQualities 1

and continue to the other step normally and it succeeded.

So I would say this is the correct answer (at least in my case I can not accept it of course)

ADD REPLY

Login before adding your answer.

Traffic: 1749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6