Empty VCF file from Mutect2 v. GATK 4.1.6.0 in tumor only mode
0
0
Entering edit mode
4.2 years ago
pm2012 ▴ 140

Hi

I am trying to use Mutect2 from GATK to detect variants from my PacBio HiFi data. I am using Minimap2 to align my reads against the reference and using the BAM file as input to Mutect2. However, I am getting an empty VCF file. I also tested the Haplotype caller but no luck. Could you someone help me figure out what the issue is?

Here are the commands I am using:

minimap2 -a -L reference.fa sample1.fa > sample1.bam 

java -jar $EBROOTPCRDTOOLS/AddOrReplaceReadGroups.jar I=sample1.bam O=sample1_rg.bam RGID=1 RGLB=lib1 RGPL=PB RGPU=1 RGSM=sample1

bamtools index sample1_rg.bam

gatk Mutect2 -R CoV_SEM_region.fa -I sample1_rg.bam -O sample1_rg.vcf.gz

gatk --java-options "-Xmx4g" HaplotypeCaller -R reference.fa -I sample1_rg.bam -O sample1_rg.vcf.gz

In the log of the run, I do not see any error messages. Here is the log in case this is useful:

Example:

----------
[Tue Sep 29 08:31:36 EDT 2020] picard.sam.CreateSequenceDictionary REFERENCE=reference.fa OUTPUT=reference.dict    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Tue Sep 29 08:31:36 EDT 2020] Executing as abc@ai-hpcn106.cm.cluster on Linux 3.10.0-327.36.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14; Picard version: 1.119(d44cdb51745f5e8075c826430a39d8a61f1dd832_1408991805) IntelDeflater
[Tue Sep 29 08:31:36 EDT 2020] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2058354688
[M::mm_idx_gen::0.003*2.01] collected minimizers
[M::mm_idx_gen::0.005*2.31] sorted minimizers
[M::main::0.005*2.29] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.005*2.29] mid_occ = 2
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.005*2.21] distinct minimizers: 1051 (100.00% are singletons); average occurrences: 1.000; average spacing: 5.356
[M::worker_pipeline::0.240*2.85] mapped 265 sequences
[M::main] Version: 2.10-r784-dirty
[M::main] CMD: minimap2 -a -L reference.fa sample1.fa
[M::main] Real time: 0.242 sec; CPU: 0.686 sec
[Tue Sep 29 08:31:36 EDT 2020] picard.sam.AddOrReplaceReadGroups INPUT=sample1.bam OUTPUT=sample1_rg.bam RGID=1 RGLB=lib1 RGPL=PB RGPU=1 RGSM=sample1    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Tue Sep 29 08:31:36 EDT 2020] Executing as abc@ai-hpcn106.cm.cluster on Linux 3.10.0-327.36.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14; Picard version: 1.119(d44cdb51745f5e8075c826430a39d8a61f1dd832_1408991805) IntelDeflater
INFO    2020-09-29 08:31:37 AddOrReplaceReadGroups  Created read group ID=1 PL=PB LB=lib1 SM=sample1

[Tue Sep 29 08:31:37 EDT 2020] picard.sam.AddOrReplaceReadGroups done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2058354688
Using GATK jar /sysapps/cluster/software/GATK/4.1.6.0-Java-1.8.0_92/gatk-package-4.1.6.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /sysapps/cluster/software/GATK/4.1.6.0-Java-1.8.0_92/gatk-package-4.1.6.0-local.jar Mutect2 -R reference.fa -I sample1_rg.bam -O sample1_rg.vcf.gz
08:31:40.308 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/sysapps/cluster/software/GATK/4.1.6.0-Java-1.8.0_92/gatk-package-4.1.6.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Sep 29, 2020 8:31:40 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
08:31:40.583 INFO  Mutect2 - ------------------------------------------------------------
08:31:40.584 INFO  Mutect2 - The Genome Analysis Toolkit (GATK) v4.1.6.0
08:31:40.584 INFO  Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
08:31:40.586 INFO  Mutect2 - Executing as abc@ai-hpcn106.cm.cluster on Linux v3.10.0-327.36.1.el7.x86_64 amd64
08:31:40.586 INFO  Mutect2 - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_92-b14
08:31:40.587 INFO  Mutect2 - Start Date/Time: September 29, 2020 8:31:40 AM EDT
08:31:40.587 INFO  Mutect2 - ------------------------------------------------------------
08:31:40.587 INFO  Mutect2 - ------------------------------------------------------------
08:31:40.588 INFO  Mutect2 - HTSJDK Version: 2.21.2
08:31:40.588 INFO  Mutect2 - Picard Version: 2.21.9
08:31:40.588 INFO  Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:31:40.588 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:31:40.588 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:31:40.588 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:31:40.588 INFO  Mutect2 - Deflater: IntelDeflater
08:31:40.588 INFO  Mutect2 - Inflater: IntelInflater
08:31:40.589 INFO  Mutect2 - GCS max retries/reopens: 20
08:31:40.589 INFO  Mutect2 - Requester pays: disabled
08:31:40.589 INFO  Mutect2 - Initializing engine
08:31:41.027 INFO  Mutect2 - Done initializing engine
08:31:41.052 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/sysapps/cluster/software/GATK/4.1.6.0-Java-1.8.0_92/gatk-package-4.1.6.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
08:31:41.060 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/sysapps/cluster/software/GATK/4.1.6.0-Java-1.8.0_92/gatk-package-4.1.6.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
08:31:41.089 WARN  NativeLibraryLoader - Unable to load libgkl_pairhmm_omp.so from native/libgkl_pairhmm_omp.so (/tmp/libgkl_pairhmm_omp5792852026840319140.so: /sysapps/cluster/software/GCC/4.8.4/lib64/libgomp.so.1: version `GOMP_4.0' not found (required by /tmp/libgkl_pairhmm_omp5792852026840319140.so))
08:31:41.090 INFO  PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
08:31:41.090 INFO  NativeLibraryLoader - Loading libgkl_pairhmm.so from jar:file:/sysapps/cluster/software/GATK/4.1.6.0-Java-1.8.0_92/gatk-package-4.1.6.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm.so
08:31:41.164 INFO  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
08:31:41.165 WARN  IntelPairHmm - Ignoring request for 4 threads; not using OpenMP implementation
08:31:41.165 INFO  PairHMM - Using the AVX-accelerated native PairHMM implementation
08:31:41.205 INFO  ProgressMeter - Starting traversal
08:31:41.206 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
08:31:41.344 INFO  Mutect2 - 0 read(s) filtered by: MappingQualityReadFilter 
0 read(s) filtered by: MappingQualityAvailableReadFilter 
0 read(s) filtered by: MappingQualityNotZeroReadFilter 
0 read(s) filtered by: MappedReadFilter 
0 read(s) filtered by: NotSecondaryAlignmentReadFilter 
0 read(s) filtered by: NotDuplicateReadFilter 
0 read(s) filtered by: PassesVendorQualityCheckReadFilter 
0 read(s) filtered by: NonChimericOriginalAlignmentReadFilter 
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter 
0 read(s) filtered by: ReadLengthReadFilter 
0 read(s) filtered by: GoodCigarReadFilter 
265 read(s) filtered by: WellformedReadFilter 
265 total reads filtered
08:31:41.345 INFO  ProgressMeter -       Reference:2701              0.0                    19           8260.9
08:31:41.345 INFO  ProgressMeter - Traversal complete. Processed 19 total regions in 0.0 minutes.
08:31:41.358 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
08:31:41.358 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
08:31:41.358 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
08:31:41.359 INFO  Mutect2 - Shutting down engine
[September 29, 2020 8:31:41 AM EDT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2326265856
GATK SNP • 1.3k views
ADD COMMENT
0
Entering edit mode

Do you know if Mutect2 can be used with long reads? Since it does local realignments, I am not sure if you can.

ADD REPLY
0
Entering edit mode

@genomax this was suggested by PacBio reps along with DeepVariant. For my purpose it works fine (only if I could get this to work).

The problem appears to be the WellformedReadFilter, which is filtering all my reads:

265 read(s) filtered by: WellformedReadFilter

https://gatk.broadinstitute.org/hc/en-us/articles/360037267451-WellformedReadFilter

Disabling this filter leads to array out of index error.

ADD REPLY

Login before adding your answer.

Traffic: 1928 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6