Hi all,
For a comparative study, we received BAM-files from a lot of different laboratories. We would like to have a look at the Fold_80_base_penalty and coverage statistics. Therefore, I used the CollectTargetedPcrMetrics part of Picard-metrics. This works fine for the BAM-files that we generated ourselves, for which we also have the correct reference sequence. However, when we try to run CollectTargetedPcrMetrics for BAM-files that we received and we do not have the correct reference sequence, we get a lot of errors. As a reference sequence, we use the standard reference sequence available here (human.g1k_v37.fa). I know for sure that the other BAM-files were also generated using hg19, but I presume there are some subtle differences in the actual reference sequences used. Below the error I get:
java -jar picard.jar CollectTargetedPcrMetrics AMPLICON_INTERVALS=tumor_fpa.hg19.tech.target.interval_list TARGET_INTERVALS=tumor_fpa.hg19.tech.target.interval_list OUTPUT=/Participant23_sampleA_TSTPool_TST_A23_S1_pcr_metrics.txt INPUT=Part25_sampleA.bam REFERENCE_SEQUENCE=human_g1k_v37.fa VALIDATION_STRINGENCY=LENIENT
[Mon Dec 12 CET 2016] picard.analysis.directed.CollectTargetedPcrMetrics
[Mon Dec 12 CET 2016] Executing as lien@lien on Linux 4.4.0-53-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15; Picard version: 2.4.1(7c4d36e011df1aec4689b51efcada44e92d1817f_1464389670) JdkDeflater
[Mon Dec 12 CET 2016] picard.analysis.directed.CollectTargetedPcrMetrics done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=504889344
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: **Sequences at index 0 don't match:** 0/16571/M/M5=6b9e15a5937653b0006de435f91578c0 0/249250621/1/UR=file:/humgen/gsa-hpprojects/1kg/reference/human_g1k_v37.fasta/M5=1b22b98cdeb4a9304cb5d48026a85128
at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil.java:209)
at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:271)
at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:257)
at picard.analysis.directed.CollectTargetedMetrics.doWork(CollectTargetedMetrics.java:114)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
We need to change already the .dict and .fai file generated from our reference sequence, but still this will not solve the problem, as I think there are small differences in the actual reference sequence they used.
Are there any solutions to work around this problem? To force Picard in using the reference sequence? Thanks!
Hi Harold, Thanks for your answer, but I'm not sure it is present in every header. Below the output of samtools view for one of our samples:
This doesn't allow to find the reference sequence that was used, or am I wrong?
Hmmm, my headers always include a 'CL' field as part of the @PG line that shows the command line (including reference) used for alignment. Yours doesn't, but it appears to have been generated using Illumina's ISIS pipeline. You may try to contact them and see if there's a straightforward method to match the ISIS version to reference genome version.