Error in GATK GenomicsDBImport
1
0
Entering edit mode
4.7 years ago
raf.marcondes ▴ 110

Hi,

I have a large number of gvcf files that I'm trying to joint genotype, by first running GenomicsDBImport in GATK 4.1.4.0. When I say large I mean 135 samples * 229 genomic intervals = 30,915 files.

Here's what I have:

java -Xmx80g -XX:ParallelGCThreads=20 -jar $GATKPATH GenomicsDBImport -L $LIST \
-V ${SLURM_ARRAY_TASK_ID}.1.raw.g.vcf \
-V ${SLURM_ARRAY_TASK_ID}.2.raw.g.vcf \
-V ${SLURM_ARRAY_TASK_ID}.3.raw.g.vcf \
-V ${SLURM_ARRAY_TASK_ID}.4.raw.g.vcf \
-V ${SLURM_ARRAY_TASK_ID}.5.raw.g.vcf \
-V ${SLURM_ARRAY_TASK_ID}.6.raw.g.vcf \
...
-V ${SLURM_ARRAY_TASK_ID}.133.raw.g.vcf \
-V ${SLURM_ARRAY_TASK_ID}.134.raw.g.vcf \
-V ${SLURM_ARRAY_TASK_ID}.135.raw.g.vcf \
--merge-input-intervals true \
--genomicsdb-workspace-path /n/holyscratch01/edwards_lab/rafa/genomic_DBs/db_${SLURM_ARRAY_TASK_ID}

where list points the location of the scaffold list for each interval, and the task ID identifies the interval.

This runs for a while but then this happens:

13:43:09.139 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/n/holyscratch01/edwards_lab/rafa/gatk-package-4.1.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 20, 2020 1:43:13 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
13:43:13.385 INFO  GenomicsDBImport - ------------------------------------------------------------
13:43:13.385 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.4.0
13:43:13.385 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
13:43:14.389 INFO  GenomicsDBImport - Executing as rmarcondes@holy2c02310.rc.fas.harvard.edu on Linux v3.10.0-957.12.1.el7.x86_64 amd64
13:43:14.389 INFO  GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v10.0.1+10
13:43:14.389 INFO  GenomicsDBImport - Start Date/Time: March 20, 2020 at 1:43:09 PM GMT-05:00
13:43:14.389 INFO  GenomicsDBImport - ------------------------------------------------------------
13:43:14.389 INFO  GenomicsDBImport - ------------------------------------------------------------
13:43:14.390 INFO  GenomicsDBImport - HTSJDK Version: 2.20.3
13:43:14.390 INFO  GenomicsDBImport - Picard Version: 2.21.1
13:43:14.390 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:43:14.390 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:43:14.390 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:43:14.390 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:43:14.390 INFO  GenomicsDBImport - Deflater: IntelDeflater
13:43:14.390 INFO  GenomicsDBImport - Inflater: IntelInflater
13:43:14.390 INFO  GenomicsDBImport - GCS max retries/reopens: 20
13:43:14.390 INFO  GenomicsDBImport - Requester pays: disabled
13:43:14.391 INFO  GenomicsDBImport - Initializing engine
13:44:18.385 INFO  IntervalArgumentCollection - Processing 48059334 bp from intervals
13:44:18.412 INFO  GenomicsDBImport - Done initializing engine
13:44:18.806 INFO  GenomicsDBImport - Shutting down engine
[March 20, 2020 at 1:44:18 PM GMT-05:00] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 1.16 minutes.
Runtime.totalMemory()=11559501824
***********************************************************************

A USER ERROR has occurred: Error creating GenomicsDB workspace: /n/holyscratch01/edwards_lab/rafa/genomic_DBs/db_177 already exists

Thanks for any pointers!!!!

gatk GenomicsDBImport • 5.3k views
ADD COMMENT
0
Entering edit mode

If you check the last line of the log the error is already mentioned.

A USER ERROR has occurred: Error creating GenomicsDB workspace: /n/holyscratch01/edwards_lab/rafa/genomic_DBs/db_177 already exists

You should consider a different naming strategy for the DB file.

ADD REPLY
0
Entering edit mode

Just remove the directory "db_177" and try again. But make sure genomics_DBs directory has been created.

ADD REPLY
1
Entering edit mode
4.1 years ago
yussab ▴ 100

I went through the GATK GenomicDBImport and this is the solution You can find all the useful information at the link below.

REFERENCE: https://gatk.broadinstitute.org/hc/en-us/articles/360036883491-GenomicsDBImport

IMPORTANT: "The --genomicsdb-workspace-path must point to a non-existent or empty directory."

Remeber to set the post as solved if you've get the correct answer ;)

ADD COMMENT

Login before adding your answer.

Traffic: 2799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6