Dear BioStar Users,
I have a problem with GAT4 ASEReadCounter
and could not find help googling or in GATK4 forum.
My code:
gatk ASEReadCounter -I sortedByCoord.groups.bam -V mutect.snp.X.chrom.vcf --output-format CSV -O output.csv
and the output
Using GATK jar /software/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /software/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar ASEReadCounter -I /Synology/002-Projects/320-Adamo_Escape/3207_BAM/sortedByCoord.groups.bam -V /Synology/002-Projects/320-Adamo_Escape/chrom_hist_nopar/S3_1_mutect.snp.X.noPAR.header.vcf -O S3_1.3207_1294.47XXY.Rtable
16:32:59.086 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/software/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Dec 18, 2019 4:32:59 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:32:59.645 INFO ASEReadCounter - ------------------------------------------------------------
16:32:59.646 INFO ASEReadCounter - The Genome Analysis Toolkit (GATK) v4.1.4.1
16:32:59.647 INFO ASEReadCounter - For support and documentation go to https://software.broadinstitute.org/gatk/
16:32:59.648 INFO ASEReadCounter - Executing as asaera@sequentia on Linux v4.15.0-72-generic amd64
16:32:59.648 INFO ASEReadCounter - Java runtime: OpenJDK 64-Bit Server VM v11.0.5+10-post-Ubuntu-0ubuntu1.118.04
16:32:59.648 INFO ASEReadCounter - Start Date/Time: 18 de diciembre de 2019, 16:32:58 UTC
16:32:59.649 INFO ASEReadCounter - ------------------------------------------------------------
16:32:59.649 INFO ASEReadCounter - ------------------------------------------------------------
16:32:59.652 INFO ASEReadCounter - HTSJDK Version: 2.21.0
16:32:59.652 INFO ASEReadCounter - Picard Version: 2.21.2
16:32:59.652 INFO ASEReadCounter - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:32:59.653 INFO ASEReadCounter - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:32:59.653 INFO ASEReadCounter - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:32:59.654 INFO ASEReadCounter - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:32:59.654 INFO ASEReadCounter - Deflater: IntelDeflater
16:32:59.654 INFO ASEReadCounter - Inflater: IntelInflater
16:32:59.654 INFO ASEReadCounter - GCS max retries/reopens: 20
16:32:59.654 INFO ASEReadCounter - Requester pays: disabled
16:32:59.655 INFO ASEReadCounter - Initializing engine
16:33:00.299 INFO FeatureManager - Using codec VCFCodec to read file file:///Synology/002-Projects/320-Adamo_Escape/chrom_hist_nopar/S3_1_mutect.snp.X.noPAR.header.vcf
16:33:00.379 INFO ASEReadCounter - Done initializing engine
16:33:00.445 INFO ProgressMeter - Starting traversal
16:33:00.446 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
16:33:00.534 INFO ASEReadCounter - Shutting down engine
[18 de diciembre de 2019, 16:33:00 UTC] org.broadinstitute.hellbender.tools.walkers.rnaseq.ASEReadCounter done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=2155872256
java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0
at org.broadinstitute.hellbender.engine.ReferenceContext.getBase(ReferenceContext.java:406)
at org.broadinstitute.hellbender.tools.walkers.rnaseq.ASEReadCounter.apply(ASEReadCounter.java:183)
at org.broadinstitute.hellbender.engine.LocusWalker.lambda$traverse$0(LocusWalker.java:180)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:178)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
There are no errors, as I've seen in other posts, and the java exceptions are not very informative (at least for me). The csv only contains the header, meaning no results.
Following gatk Got a problem? (grey box on the left,) suggestions I did:
1.- Checking of the vcf with gatk ValidateVariants
as recommended here, but seems fine.
gatk ValidateVariants -R Homo_sapiens.GRCh37.dna.primary_assembly.fa -V mutect.snp.X.chrom.vcf
Using GATK jar /software/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /software/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar ValidateVariants -R Homo_sapiens.GRCh37.dna.primary_assembly.fa -V mutect.snp.X.chrom.vcf
17:50:22.728 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/software/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Dec 18, 2019 5:50:23 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
17:50:23.105 INFO ValidateVariants - ------------------------------------------------------------
17:50:23.106 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.4.1
17:50:23.106 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
17:50:23.106 INFO ValidateVariants - Executing as asaera@sequentia on Linux v4.15.0-72-generic amd64
17:50:23.107 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v11.0.5+10-post-Ubuntu-0ubuntu1.118.04
17:50:23.107 INFO ValidateVariants - Start Date/Time: 18 de diciembre de 2019, 17:50:22 UTC
17:50:23.107 INFO ValidateVariants - ------------------------------------------------------------
17:50:23.107 INFO ValidateVariants - ------------------------------------------------------------
17:50:23.108 INFO ValidateVariants - HTSJDK Version: 2.21.0
17:50:23.109 INFO ValidateVariants - Picard Version: 2.21.2
17:50:23.109 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
17:50:23.109 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:50:23.109 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:50:23.110 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:50:23.110 INFO ValidateVariants - Deflater: IntelDeflater
17:50:23.110 INFO ValidateVariants - Inflater: IntelInflater
17:50:23.110 INFO ValidateVariants - GCS max retries/reopens: 20
17:50:23.110 INFO ValidateVariants - Requester pays: disabled
17:50:23.111 INFO ValidateVariants - Initializing engine
17:50:23.702 INFO FeatureManager - Using codec VCFCodec to read file file://mutect.snp.X.chrom.vcf
17:50:23.765 INFO ValidateVariants - Done initializing engine
17:50:23.770 WARN ValidateVariants - IDS validation cannot be done because no DBSNP file was provided
17:50:23.770 WARN ValidateVariants - Other possible validations will still be performed
17:50:23.770 INFO ProgressMeter - Starting traversal
17:50:23.770 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
17:50:26.430 INFO ProgressMeter - X:152780582 0.0 25690 581880.0
17:50:26.431 INFO ProgressMeter - Traversal complete. Processed 25690 total variants in 0.0 minutes.
17:50:26.431 INFO ValidateVariants - Shutting down engine
[18 de diciembre de 2019, 17:50:26 UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.06 minutes.
Runtime.totalMemory()=3183476736
2.- Checking of the BAM file with picard.jar ValidateSamFile
following this and this, but again seems fine.
java -jar /software/picard.jar ValidateSamFile I=sortedByCoord.groups.bam MODE=SUMMARY
INFO 2019-12-18 17:48:13 ValidateSamFile
********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
********** ValidateSamFile -I sortedByCoord.groups.bam -MODE SUMMARY
**********
17:48:14.441 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/software/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Dec 18 17:48:14 UTC 2019] ValidateSamFile INPUT=sortedByCoord.groups.bam MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 SKIP_MATE_VALIDATION=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Dec 18 17:48:14 UTC 2019] Executing as asaera@sequentia on Linux 4.15.0-72-generic amd64; OpenJDK 64-Bit Server VM 11.0.5+10-post-Ubuntu-0ubuntu1.118.04; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.1-SNAPSHOT
WARNING 2019-12-18 17:48:14 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
INFO 2019-12-18 17:49:51 SamFileValidator Validated Read 10.000.000 records. Elapsed time: 00:01:36s. Time for last 10.000.000: 96s. Last read position: 10:33.208.922
INFO 2019-12-18 17:51:46 SamFileValidator Validated Read 20.000.000 records. Elapsed time: 00:03:31s. Time for last 10.000.000: 115s. Last read position: 12:49.578.991
INFO 2019-12-18 17:53:53 SamFileValidator Validated Read 30.000.000 records. Elapsed time: 00:05:38s. Time for last 10.000.000: 126s. Last read position: 15:69.075.317
INFO 2019-12-18 17:56:06 SamFileValidator Validated Read 40.000.000 records. Elapsed time: 00:07:51s. Time for last 10.000.000: 133s. Last read position: 17:61.670.652
INFO 2019-12-18 17:58:44 SamFileValidator Validated Read 50.000.000 records. Elapsed time: 00:10:29s. Time for last 10.000.000: 157s. Last read position: 2:47.387.777
INFO 2019-12-18 18:01:17 SamFileValidator Validated Read 60.000.000 records. Elapsed time: 00:13:02s. Time for last 10.000.000: 153s. Last read position: 22:45.598.970
INFO 2019-12-18 18:04:07 SamFileValidator Validated Read 70.000.000 records. Elapsed time: 00:15:53s. Time for last 10.000.000: 170s. Last read position: 5:137.848.541
INFO 2019-12-18 18:07:08 SamFileValidator Validated Read 80.000.000 records. Elapsed time: 00:18:53s. Time for last 10.000.000: 180s. Last read position: 7:56.124.013
INFO 2019-12-18 18:10:00 SamFileValidator Validated Read 90.000.000 records. Elapsed time: 00:21:45s. Time for last 10.000.000: 171s. Last read position: MT:4.489
## HISTOGRAM java.lang.String
Error Type Count
ERROR:MATE_NOT_FOUND 33070
WARNING:MISSING_TAG_NM 98426757
[Wed Dec 18 18:12:35 UTC 2019] picard.sam.ValidateSamFile done. Elapsed time: 24,36 minutes.
Runtime.totalMemory()=3170893824
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
I tried to post on GATK forum but could not do it since I am a new user and new user's posts are not allowed for some issue with SPAM.
Any ideas? your help will be highly appreciated.
Thanks
This worked for me, too! Many thanks!
A few additional notes to save future users some trial and error:
To save a few more minutes, just in case the link provided in the comment does not load: To make a Fasta dict file one can use this GATK sub-command:
gatk CreateSequenceDictionary -R reference.genome.fa
. I found this infor from this GATK documentation.I meet the same preoblem , and I finally made it work by AddOrReplaceReadGroups to my bam file.
use this GATK sub-command: gatk AddOrReplaceReadGroups -I ./KC.bam -O addedgroup_KC.bam -SORT_ORDER coordinate -RGID 4 -RGLB DNA -RGPL ILLUMINA -RGPU RNAseq -RGSM KC