Entering edit mode
4.2 years ago
pm2012
▴
140
Hi
I am trying to use Mutect2 from GATK to detect variants from my PacBio HiFi data. I am using Minimap2 to align my reads against the reference and using the BAM file as input to Mutect2. However, I am getting an empty VCF file. I also tested the Haplotype caller but no luck. Could you someone help me figure out what the issue is?
Here are the commands I am using:
minimap2 -a -L reference.fa sample1.fa > sample1.bam
java -jar $EBROOTPCRDTOOLS/AddOrReplaceReadGroups.jar I=sample1.bam O=sample1_rg.bam RGID=1 RGLB=lib1 RGPL=PB RGPU=1 RGSM=sample1
bamtools index sample1_rg.bam
gatk Mutect2 -R CoV_SEM_region.fa -I sample1_rg.bam -O sample1_rg.vcf.gz
gatk --java-options "-Xmx4g" HaplotypeCaller -R reference.fa -I sample1_rg.bam -O sample1_rg.vcf.gz
In the log of the run, I do not see any error messages. Here is the log in case this is useful:
Example:
----------
[Tue Sep 29 08:31:36 EDT 2020] picard.sam.CreateSequenceDictionary REFERENCE=reference.fa OUTPUT=reference.dict TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Tue Sep 29 08:31:36 EDT 2020] Executing as abc@ai-hpcn106.cm.cluster on Linux 3.10.0-327.36.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14; Picard version: 1.119(d44cdb51745f5e8075c826430a39d8a61f1dd832_1408991805) IntelDeflater
[Tue Sep 29 08:31:36 EDT 2020] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2058354688
[M::mm_idx_gen::0.003*2.01] collected minimizers
[M::mm_idx_gen::0.005*2.31] sorted minimizers
[M::main::0.005*2.29] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.005*2.29] mid_occ = 2
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.005*2.21] distinct minimizers: 1051 (100.00% are singletons); average occurrences: 1.000; average spacing: 5.356
[M::worker_pipeline::0.240*2.85] mapped 265 sequences
[M::main] Version: 2.10-r784-dirty
[M::main] CMD: minimap2 -a -L reference.fa sample1.fa
[M::main] Real time: 0.242 sec; CPU: 0.686 sec
[Tue Sep 29 08:31:36 EDT 2020] picard.sam.AddOrReplaceReadGroups INPUT=sample1.bam OUTPUT=sample1_rg.bam RGID=1 RGLB=lib1 RGPL=PB RGPU=1 RGSM=sample1 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Tue Sep 29 08:31:36 EDT 2020] Executing as abc@ai-hpcn106.cm.cluster on Linux 3.10.0-327.36.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14; Picard version: 1.119(d44cdb51745f5e8075c826430a39d8a61f1dd832_1408991805) IntelDeflater
INFO 2020-09-29 08:31:37 AddOrReplaceReadGroups Created read group ID=1 PL=PB LB=lib1 SM=sample1
[Tue Sep 29 08:31:37 EDT 2020] picard.sam.AddOrReplaceReadGroups done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2058354688
Using GATK jar /sysapps/cluster/software/GATK/4.1.6.0-Java-1.8.0_92/gatk-package-4.1.6.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /sysapps/cluster/software/GATK/4.1.6.0-Java-1.8.0_92/gatk-package-4.1.6.0-local.jar Mutect2 -R reference.fa -I sample1_rg.bam -O sample1_rg.vcf.gz
08:31:40.308 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/sysapps/cluster/software/GATK/4.1.6.0-Java-1.8.0_92/gatk-package-4.1.6.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Sep 29, 2020 8:31:40 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
08:31:40.583 INFO Mutect2 - ------------------------------------------------------------
08:31:40.584 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.1.6.0
08:31:40.584 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
08:31:40.586 INFO Mutect2 - Executing as abc@ai-hpcn106.cm.cluster on Linux v3.10.0-327.36.1.el7.x86_64 amd64
08:31:40.586 INFO Mutect2 - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_92-b14
08:31:40.587 INFO Mutect2 - Start Date/Time: September 29, 2020 8:31:40 AM EDT
08:31:40.587 INFO Mutect2 - ------------------------------------------------------------
08:31:40.587 INFO Mutect2 - ------------------------------------------------------------
08:31:40.588 INFO Mutect2 - HTSJDK Version: 2.21.2
08:31:40.588 INFO Mutect2 - Picard Version: 2.21.9
08:31:40.588 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:31:40.588 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:31:40.588 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:31:40.588 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:31:40.588 INFO Mutect2 - Deflater: IntelDeflater
08:31:40.588 INFO Mutect2 - Inflater: IntelInflater
08:31:40.589 INFO Mutect2 - GCS max retries/reopens: 20
08:31:40.589 INFO Mutect2 - Requester pays: disabled
08:31:40.589 INFO Mutect2 - Initializing engine
08:31:41.027 INFO Mutect2 - Done initializing engine
08:31:41.052 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/sysapps/cluster/software/GATK/4.1.6.0-Java-1.8.0_92/gatk-package-4.1.6.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
08:31:41.060 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/sysapps/cluster/software/GATK/4.1.6.0-Java-1.8.0_92/gatk-package-4.1.6.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
08:31:41.089 WARN NativeLibraryLoader - Unable to load libgkl_pairhmm_omp.so from native/libgkl_pairhmm_omp.so (/tmp/libgkl_pairhmm_omp5792852026840319140.so: /sysapps/cluster/software/GCC/4.8.4/lib64/libgomp.so.1: version `GOMP_4.0' not found (required by /tmp/libgkl_pairhmm_omp5792852026840319140.so))
08:31:41.090 INFO PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
08:31:41.090 INFO NativeLibraryLoader - Loading libgkl_pairhmm.so from jar:file:/sysapps/cluster/software/GATK/4.1.6.0-Java-1.8.0_92/gatk-package-4.1.6.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm.so
08:31:41.164 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
08:31:41.165 WARN IntelPairHmm - Ignoring request for 4 threads; not using OpenMP implementation
08:31:41.165 INFO PairHMM - Using the AVX-accelerated native PairHMM implementation
08:31:41.205 INFO ProgressMeter - Starting traversal
08:31:41.206 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
08:31:41.344 INFO Mutect2 - 0 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappingQualityNotZeroReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonChimericOriginalAlignmentReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: ReadLengthReadFilter
0 read(s) filtered by: GoodCigarReadFilter
265 read(s) filtered by: WellformedReadFilter
265 total reads filtered
08:31:41.345 INFO ProgressMeter - Reference:2701 0.0 19 8260.9
08:31:41.345 INFO ProgressMeter - Traversal complete. Processed 19 total regions in 0.0 minutes.
08:31:41.358 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
08:31:41.358 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
08:31:41.358 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
08:31:41.359 INFO Mutect2 - Shutting down engine
[September 29, 2020 8:31:41 AM EDT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2326265856
Do you know if Mutect2 can be used with long reads? Since it does local realignments, I am not sure if you can.
@genomax this was suggested by PacBio reps along with DeepVariant. For my purpose it works fine (only if I could get this to work).
The problem appears to be the WellformedReadFilter, which is filtering all my reads:
265 read(s) filtered by: WellformedReadFilter
https://gatk.broadinstitute.org/hc/en-us/articles/360037267451-WellformedReadFilter
Disabling this filter leads to array out of index error.