Hi all,
I have been having issues using VariantRecalibrator in GATK 4.4.0.0. My command is:
gatk VariantRecalibrator \
-R {ref.fasta} -V {input.vcf.gz} \
--resource:hapmap,known=false,training=true,truth=true,prior=15.0 {path/to/vcf.gz} \
--resource:omni,known=false,training=true,truth=false,prior=12.0 {path/to/vcf.gz} \
--resource:1000G,known=false,training=true,truth=false,prior=10.0 {path/to/vcf.gz} \
--resource:dbsnp,known=true,training=false,truth=false,prior=2.0 {path/to/vcf.gz} \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR \
-mode SNP \
-O {output_vqsr.recal} --tranches-file {output_vqsr.tranches} --dont-run-rscript
And the error I have been encountering is:
A USER ERROR has occurred: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.
I have run VariantAnnotator using the command:
gatk VariantAnnotator \
-R {ref.fasta} -V {input.vcf} -O {annotated_input.vcf} \
-A QualByDepth -A Coverage -A DepthPerSampleHC -A StrandOddsRatio -A FisherStrand \
-A RMSMappingQuality -A MappingQualityRankSumTest -A ReadPosRankSumTest
And subsequently used my annotated_input.vcf as the input for VariantRecalibrator, only to get the same error. Having manually checked my inputs (both the annotated vcf and the original vcf), I can confirm they do in fact include the QD annotation in the INFO column, indicating the error might be that it is not being detected by VariantRecalibrator. For an example, the INFO column for one of the variants is as follows:
DP=164;ExcessHet=0;FS=0.000;MLEAC=2,0,0;MLEAF=1,0,0;QD=27.22;RAW_MQandDP=590400,164;SOR=1.944
Has anyone encountered this error that could suggest a solution? I have repeated the whole process with various samples in case this was caused by a corrupted file, but always encounter the same error message, when in every case by checking manually I can see the annotation is present.
The full output message for VariantRecallibrator is below:
Using GATK jar /mnt/home/soft/gatk/programs/x86_64/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /mnt/home/soft/gatk/programs/x86_64/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar VariantRecalibrator -R /mnt2/fscratch/users/oncoh_011_ibima/sbrown/data/reference_genome/Homo_sapiens_assembly38.fasta -V /mnt2/fscratch/users/oncoh_011_ibima/sbrown/data/wes_data/gatk_out/vcf/germline/LAAD318_snps_marked.vcf.gz --resource:hapmap,known=false,training=true,truth=true,prior=15.0 /mnt/scratch/users/oncoh_011_ibima/data/genomic_reference_data/GRCh38/GATK/hapmap_3.3.hg38.vcf.gz --resource:omni,known=false,training=true,truth=false,prior=12.0 /mnt/scratch/users/oncoh_011_ibima/data/genomic_reference_data/GRCh38/GATK/1000G_omni2.5.hg38.vcf.gz --resource:1000G,known=false,training=true,truth=false,prior=10.0 /mnt/scratch/users/oncoh_011_ibima/data/genomic_reference_data/GRCh38/GATK/1000G_phase1.snps.high_confidence.hg38.vcf.gz --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /mnt/scratch/users/oncoh_011_ibima/data/genomic_reference_data/dbsnp/Homo_sapiens_assembly38.dbsnp138.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP -O /mnt2/fscratch/users/oncoh_011_ibima/sbrown/data/wes_data/gatk_out/vcf/germline/LAAD318_snp_vqsr.recal --tranches-file /mnt2/fscratch/users/oncoh_011_ibima/sbrown/data/wes_data/gatk_out/vcf/germline/LAAD318_snp_vqsr.tranches --dont-run-rscript 13:10:00.618 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/home/soft/gatk/programs/x86_64/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 13:10:00.751 INFO VariantRecalibrator - ------------------------------------------------------------ 13:10:00.755 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.4.0.0 13:10:00.755 INFO VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/ 13:10:00.756 INFO VariantRecalibrator - Executing as sbrown@sr001 on Linux v5.14.21-150400.22-default amd64 13:10:00.756 INFO VariantRecalibrator - Java runtime: Java HotSpot(TM) 64-Bit Server VM v17.0.1+12-LTS-39 13:10:00.764 INFO VariantRecalibrator - Start Date/Time: 20 de agosto de 2024, 13:10:00 CEST 13:10:00.764 INFO VariantRecalibrator - ------------------------------------------------------------ 13:10:00.764 INFO VariantRecalibrator - ------------------------------------------------------------ 13:10:00.765 INFO VariantRecalibrator - HTSJDK Version: 3.0.5 13:10:00.765 INFO VariantRecalibrator - Picard Version: 3.0.0 13:10:00.765 INFO VariantRecalibrator - Built for Spark Version: 3.3.1 13:10:00.765 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2 13:10:00.765 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 13:10:00.765 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 13:10:00.766 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 13:10:00.766 INFO VariantRecalibrator - Deflater: IntelDeflater 13:10:00.766 INFO VariantRecalibrator - Inflater: IntelInflater 13:10:00.766 INFO VariantRecalibrator - GCS max retries/reopens: 20 13:10:00.766 INFO VariantRecalibrator - Requester pays: disabled 13:10:00.767 INFO VariantRecalibrator - Initializing engine 13:10:01.284 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/scratch/users/oncoh_011_ibima/data/genomic_reference_data/GRCh38/GATK/hapmap_3.3.hg38.vcf.gz 13:10:01.824 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/scratch/users/oncoh_011_ibima/data/genomic_reference_data/GRCh38/GATK/1000G_omni2.5.hg38.vcf.gz 13:10:02.050 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/scratch/users/oncoh_011_ibima/data/genomic_reference_data/GRCh38/GATK/1000G_phase1.snps.high_confid ence.hg38.vcf.gz 13:10:02.298 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/scratch/users/oncoh_011_ibima/data/genomic_reference_data/dbsnp/Homo_sapiens_assembly38.dbsnp138.vcf 13:10:02.744 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt2/fscratch/users/oncoh_011_ibima/sbrown/data/wes_data/gatk_out/vcf/germline/LAAD318_snps_marked.vcf.gz 13:10:03.101 INFO VariantRecalibrator - Done initializing engine 13:10:03.115 INFO TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0 13:10:03.115 INFO TrainingSet - Found omni track: Known = false Training = true Truth = false Prior = Q12.0 13:10:03.115 INFO TrainingSet - Found 1000G track: Known = false Training = true Truth = false Prior = Q10.0 13:10:03.115 INFO TrainingSet - Found dbsnp track: Known = true Training = false Truth = false Prior = Q2.0 13:10:03.163 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "recal". Defaulting to VCF. 13:10:03.305 INFO ProgressMeter - Starting traversal 13:10:03.306 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute 13:10:04.118 WARN IntelInflater - Zero Bytes Written : 0 13:10:04.170 INFO ProgressMeter - chr22:46535180 0.0 33802 2443518.1 13:10:04.170 INFO ProgressMeter - Traversal complete. Processed 33802 total variants in 0.0 minutes. 13:10:04.198 INFO VariantRecalibrator - Shutting down engine [20 de agosto de 2024, 13:10:04 CEST] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 0.06 minutes. Runtime.totalMemory()=788529152
A USER ERROR has occurred: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.
org.broadinstitute.hellbender.exceptions.UserException$BadInput: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations. at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantDataManager.normalizeData(VariantDataManager.java:81) at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:638) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1102) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)
Does your input VCF contains variants that are present in omni, 1000G or hapmap?