GATK4 VariantAnnotator deletes variants
1
0
Entering edit mode
4 months ago
ManuelDB ▴ 110

I am using VariantAnnotator gatk:4.1.3.0 without success and not clear error message is shown. However, same code in GATK3 works fine. I show the code with both versions

 apptainer exec --bind /mnt:/mnt docker://broadinstitute/gatk:4.1.3.0 \
          /gatk/gatk IndexFeatureFile \
          -F "240605_M00321_0177_000000000-GLNFM_1109026765.vcf"

apptainer exec --bind /mnt:/mnt docker://broadinstitute/gatk:4.1.3.0 \
          /gatk/gatk VariantAnnotator  \
            -R $bwarefgenomepath \
            -V "1109026765/240605_M00321_0177_000000000-GLNFM_1109026765.vcf" \
            -I "1109026765/240605_M00321_0177_000000000-GLNFM_1109026765.bam" \
            -O "1109026765/240605_M00321_0177_000000000-GLNFM_1109026765_anno_GATK4.vcf" \
            -L "$ampliconscoordenatesbed" \
            -A BaseQualityRankSumTest \
            -A ReadPosRankSumTest \
            --verbosity DEBUG


tail  "1109026765/240605_M00321_0177_000000000-GLNFM_1109026765_anno.vcf"

apptainer exec --bind /mnt:/mnt docker://pegi3s/gatk-3:3.8-0 java \
        -Xmx4G -jar /opt/GenomeAnalysisTK.jar \
        -T VariantAnnotator \
        -R $bwarefgenomepath \
        -I "1109026765/240605_M00321_0177_000000000-GLNFM_1109026765.bam" \
        -V "1109026765/240605_M00321_0177_000000000-GLNFM_1109026765.vcf" \
        -L "$ampliconscoordenatesbed"  \
        -o "1109026765/240605_M00321_0177_000000000-GLNFM_1109026765_anno_GATK3.vcf" \
        -A BaseQualityRankSumTest \
        -A ReadPosRankSumTest \
        -dt NONE

tail "1109026765/240605_M00321_0177_000000000-GLNFM_1109026765_anno_GATK3.vcf"

1109026765/240605_M00321_0177_000000000-GLNFM_1109026765.vcf has 1 variant 1109026765/240605_M00321_0177_000000000-GLNFM_1109026765_anno_GATK4.vcf has 0 variant 1109026765/240605_M00321_0177_000000000-GLNFM_1109026765_anno_GATK3.vcf has 1 variant

I have tried with different GATK4 versions. I Have tried with gatk:4.6.0.0. Same results

The info from the tool

 Using GATK jar /gatk/gatk-package-4.6.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.6.0.0-local.jar VariantAnnotator -R /mnt/data1/db/gcp-public-data--broad-references/hg38/v0/GIABv1mask/Homo_sapiens_assembly38.GIABv1mask.fasta -V /mnt/scratch1/projects/SSrep/240605_M00321_0177_000000000-GLNFM-light/1109026765/240605_M00321_0177_000000000-GLNFM_1109026765.vcf -I /mnt/scratch1/projects/SSrep/240605_M00321_0177_000000000-GLNFM-light/1109026765/240605_M00321_0177_000000000-GLNFM_1109026765.bam -O /mnt/scratch1/projects/SSrep/240605_M00321_0177_000000000-GLNFM-light/1109026765/240605_M00321_0177_000000000-GLNFM_1109026765_anno.vcf -L /home/dominm/SomaticPipeline/ref_files/bed_files/SSrep/SSrep_amplicons_v4.1.bed -A BaseQualityRankSumTest -A ReadPosRankSumTest --verbosity DEBUG
08:35:33.263 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.6.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
08:35:33.299 DEBUG NativeLibraryLoader - Extracting libgkl_compression.so to /tmp/libgkl_compression6022680971929891226.so
08:35:33.461 INFO  VariantAnnotator - ------------------------------------------------------------
08:35:33.464 INFO  VariantAnnotator - The Genome Analysis Toolkit (GATK) v4.6.0.0
08:35:33.464 INFO  VariantAnnotator - For support and documentation go to https://software.broadinstitute.org/gatk/
08:35:33.465 INFO  VariantAnnotator - Executing as dominm@wglsbi01 on Linux v4.18.0-553.8.1.el8_10.x86_64 amd64
08:35:33.465 INFO  VariantAnnotator - Java runtime: OpenJDK 64-Bit Server VM v17.0.9+9-Ubuntu-122.04
08:35:33.465 INFO  VariantAnnotator - Start Date/Time: July 23, 2024 at 8:35:33 AM GMT
08:35:33.465 INFO  VariantAnnotator - ------------------------------------------------------------
08:35:33.465 INFO  VariantAnnotator - ------------------------------------------------------------
08:35:33.465 INFO  VariantAnnotator - HTSJDK Version: 4.1.1
08:35:33.466 INFO  VariantAnnotator - Picard Version: 3.2.0
08:35:33.466 INFO  VariantAnnotator - Built for Spark Version: 3.5.0
08:35:33.467 INFO  VariantAnnotator - HTSJDK Defaults.BUFFER_SIZE : 131072
08:35:33.467 INFO  VariantAnnotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:35:33.467 INFO  VariantAnnotator - HTSJDK Defaults.CREATE_INDEX : false
08:35:33.467 INFO  VariantAnnotator - HTSJDK Defaults.CREATE_MD5 : false
08:35:33.467 INFO  VariantAnnotator - HTSJDK Defaults.CUSTOM_READER_FACTORY :
08:35:33.467 INFO  VariantAnnotator - HTSJDK Defaults.DISABLE_SNAPPY_COMPRESSOR : false
08:35:33.467 INFO  VariantAnnotator - HTSJDK Defaults.EBI_REFERENCE_SERVICE_URL_MASK : https://www.ebi.ac.uk/ena/cram/md5/%s
08:35:33.467 INFO  VariantAnnotator - HTSJDK Defaults.NON_ZERO_BUFFER_SIZE : 131072
08:35:33.467 INFO  VariantAnnotator - HTSJDK Defaults.REFERENCE_FASTA : null
08:35:33.467 INFO  VariantAnnotator - HTSJDK Defaults.SAM_FLAG_FIELD_FORMAT : DECIMAL
08:35:33.467 INFO  VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:35:33.467 INFO  VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:35:33.467 INFO  VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:35:33.467 INFO  VariantAnnotator - HTSJDK Defaults.USE_CRAM_REF_DOWNLOAD : false
08:35:33.468 DEBUG ConfigFactory - Configuration file values:
08:35:33.469 DEBUG ConfigFactory -      gcsMaxRetries = 20
08:35:33.469 DEBUG ConfigFactory -      gcsProjectForRequesterPays =
08:35:33.469 DEBUG ConfigFactory -      gatk_stacktrace_on_user_exception = false
08:35:33.469 DEBUG ConfigFactory -      samjdk.use_async_io_read_samtools = false
08:35:33.469 DEBUG ConfigFactory -      samjdk.use_async_io_write_samtools = true
08:35:33.469 DEBUG ConfigFactory -      samjdk.use_async_io_write_tribble = false
08:35:33.469 DEBUG ConfigFactory -      samjdk.compression_level = 2
08:35:33.469 DEBUG ConfigFactory -      spark.kryoserializer.buffer.max = 512m
08:35:33.470 DEBUG ConfigFactory -      spark.driver.maxResultSize = 0
08:35:33.470 DEBUG ConfigFactory -      spark.driver.userClassPathFirst = true
08:35:33.470 DEBUG ConfigFactory -      spark.io.compression.codec = lzf
08:35:33.470 DEBUG ConfigFactory -      spark.executor.memoryOverhead = 600
08:35:33.470 DEBUG ConfigFactory -      spark.driver.extraJavaOptions =
08:35:33.470 DEBUG ConfigFactory -      spark.executor.extraJavaOptions =
08:35:33.470 DEBUG ConfigFactory -      codec_packages = [htsjdk.variant, htsjdk.tribble, org.broadinstitute.hellbender.utils.codecs]
08:35:33.470 DEBUG ConfigFactory -      read_filter_packages = [org.broadinstitute.hellbender.engine.filters]
08:35:33.470 DEBUG ConfigFactory -      annotation_packages = [org.broadinstitute.hellbender.tools.walkers.annotator]
08:35:33.470 DEBUG ConfigFactory -      cloudPrefetchBuffer = 40
08:35:33.470 DEBUG ConfigFactory -      cloudIndexPrefetchBuffer = -1
08:35:33.470 DEBUG ConfigFactory -      createOutputBamIndex = true
08:35:33.470 INFO  VariantAnnotator - Deflater: IntelDeflater
08:35:33.470 INFO  VariantAnnotator - Inflater: IntelInflater
08:35:33.470 INFO  VariantAnnotator - GCS max retries/reopens: 20
08:35:33.470 INFO  VariantAnnotator - Requester pays: disabled
08:35:33.471 INFO  VariantAnnotator - Initializing engine
08:35:33.696 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/scratch1/projects/SSrep/240605_M00321_0177_000000000-GLNFM-light/1109026765/240605_M00321_0177_000000000-GLNFM_1109026765.vcf
08:35:33.716 DEBUG GenomeLocParser - Prepared reference sequence contig dictionary
08:35:33.717 DEBUG GenomeLocParser -  chr3 (198295559 bp)
08:35:33.717 DEBUG GenomeLocParser -  chr9 (138394717 bp)
08:35:33.717 DEBUG GenomeLocParser -  chr10 (133797422 bp)
08:35:33.717 DEBUG GenomeLocParser -  chr13 (114364328 bp)
08:35:33.717 DEBUG GenomeLocParser -  chr15 (101991189 bp)
08:35:33.717 DEBUG GenomeLocParser -  chr17 (83257441 bp)
08:35:33.721 INFO  FeatureManager - Using codec BEDCodec to read file file:///home/dominm/SomaticPipeline/ref_files/bed_files/SSrep/SSrep_amplicons_v4.1.bed
08:35:33.729 DEBUG FeatureDataSource - Cache statistics for FeatureInput /home/dominm/SomaticPipeline/ref_files/bed_files/SSrep/SSrep_amplicons_v4.1.bed:/home/dominm/SomaticPipeline/ref_files/bed_files/SSrep/SSrep_amplicons_v4.1.bed:
08:35:33.729 DEBUG FeatureCache - Cache hit rate  was 0.00% (0 out of 0 total queries)
08:35:33.730 INFO  IntervalArgumentCollection - Processing 84444 bp from intervals
08:35:33.741 INFO  VariantAnnotator - Done initializing engine
08:35:33.742 INFO  VariantAnnotator - Shutting down engine
08:35:33.743 DEBUG FeatureDataSource - Cache statistics for FeatureInput drivingVariantFile:/mnt/scratch1/projects/SSrep/240605_M00321_0177_000000000-GLNFM-light/1109026765/240605_M00321_0177_000000000-GLNFM_1109026765.vcf:
08:35:33.743 DEBUG FeatureCache - Cache hit rate  was 0.00% (0 out of 0 total queries)
08:35:33.743 DEBUG FeatureDataSource - Cache statistics for FeatureInput drivingVariantFile:/mnt/scratch1/projects/SSrep/240605_M00321_0177_000000000-GLNFM-light/1109026765/240605_M00321_0177_000000000-GLNFM_1109026765.vcf:
08:35:33.743 DEBUG FeatureCache - Cache hit rate  was 0.00% (0 out of 0 total queries)
[July 23, 2024 at 8:35:33 AM GMT] org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=285212672
***********************************************************************

A USER ERROR has occurred: Reads sample '1109026765' from readgroups tags does not match any sample in the variant genotypes

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
##contig=<ID=HLA-DRB1*15:01:01:03,length=11056>
##contig=<ID=HLA-DRB1*15:01:01:04,length=11056>
##contig=<ID=HLA-DRB1*15:02:01,length=10313>
##contig=<ID=HLA-DRB1*15:03:01:01,length=11567>
##contig=<ID=HLA-DRB1*15:03:01:02,length=11569>
##contig=<ID=HLA-DRB1*16:02:01,length=11005>
##fileDate=20240723
##reference=file:///mnt/data1/db/gcp-public-data--broad-references/hg38/v0/GIABv1mask/Homo_sapiens_assembly38.GIABv1mask.fasta
##source=Pisces 5.2.10.49
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  240605_M00321_0177_000000000-GLNFM_1109026765.bam
GATK4 • 235 views
ADD COMMENT
2
Entering edit mode
4 months ago

may be this is the problem

A USER ERROR has occurred: Reads sample '1109026765' from readgroups tags does not match any sample in the variant genotypes

your vcf contains a sample named '1109026765'

but the associated BAM file contains no Read group @RG with SM:1109026765

ADD COMMENT

Login before adding your answer.

Traffic: 1771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6