Question

Error in GATK GenomicsDBImport

0

Entering edit mode

4.7 years ago

raf.marcondes ▴ 110

Hi,

I have a large number of gvcf files that I'm trying to joint genotype, by first running GenomicsDBImport in GATK 4.1.4.0. When I say large I mean 135 samples * 229 genomic intervals = 30,915 files.

Here's what I have:

java -Xmx80g -XX:ParallelGCThreads=20 -jar $GATKPATH GenomicsDBImport -L $LIST \
-V ${SLURM_ARRAY_TASK_ID}.1.raw.g.vcf \
-V ${SLURM_ARRAY_TASK_ID}.2.raw.g.vcf \
-V ${SLURM_ARRAY_TASK_ID}.3.raw.g.vcf \
-V ${SLURM_ARRAY_TASK_ID}.4.raw.g.vcf \
-V ${SLURM_ARRAY_TASK_ID}.5.raw.g.vcf \
-V ${SLURM_ARRAY_TASK_ID}.6.raw.g.vcf \
...
-V ${SLURM_ARRAY_TASK_ID}.133.raw.g.vcf \
-V ${SLURM_ARRAY_TASK_ID}.134.raw.g.vcf \
-V ${SLURM_ARRAY_TASK_ID}.135.raw.g.vcf \
--merge-input-intervals true \
--genomicsdb-workspace-path /n/holyscratch01/edwards_lab/rafa/genomic_DBs/db_${SLURM_ARRAY_TASK_ID}

where list points the location of the scaffold list for each interval, and the task ID identifies the interval.

This runs for a while but then this happens:

13:43:09.139 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/n/holyscratch01/edwards_lab/rafa/gatk-package-4.1.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 20, 2020 1:43:13 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
13:43:13.385 INFO  GenomicsDBImport - ------------------------------------------------------------
13:43:13.385 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.4.0
13:43:13.385 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
13:43:14.389 INFO  GenomicsDBImport - Executing as rmarcondes@holy2c02310.rc.fas.harvard.edu on Linux v3.10.0-957.12.1.el7.x86_64 amd64
13:43:14.389 INFO  GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v10.0.1+10
13:43:14.389 INFO  GenomicsDBImport - Start Date/Time: March 20, 2020 at 1:43:09 PM GMT-05:00
13:43:14.389 INFO  GenomicsDBImport - ------------------------------------------------------------
13:43:14.389 INFO  GenomicsDBImport - ------------------------------------------------------------
13:43:14.390 INFO  GenomicsDBImport - HTSJDK Version: 2.20.3
13:43:14.390 INFO  GenomicsDBImport - Picard Version: 2.21.1
13:43:14.390 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:43:14.390 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:43:14.390 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:43:14.390 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:43:14.390 INFO  GenomicsDBImport - Deflater: IntelDeflater
13:43:14.390 INFO  GenomicsDBImport - Inflater: IntelInflater
13:43:14.390 INFO  GenomicsDBImport - GCS max retries/reopens: 20
13:43:14.390 INFO  GenomicsDBImport - Requester pays: disabled
13:43:14.391 INFO  GenomicsDBImport - Initializing engine
13:44:18.385 INFO  IntervalArgumentCollection - Processing 48059334 bp from intervals
13:44:18.412 INFO  GenomicsDBImport - Done initializing engine
13:44:18.806 INFO  GenomicsDBImport - Shutting down engine
[March 20, 2020 at 1:44:18 PM GMT-05:00] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 1.16 minutes.
Runtime.totalMemory()=11559501824
***********************************************************************

A USER ERROR has occurred: Error creating GenomicsDB workspace: /n/holyscratch01/edwards_lab/rafa/genomic_DBs/db_177 already exists

Thanks for any pointers!!!!

gatk GenomicsDBImport • 5.3k views

ADD COMMENT • link updated 4.1 years ago by yussab ▴ 100 • written 4.7 years ago by raf.marcondes ▴ 110

0

Entering edit mode

If you check the last line of the log the error is already mentioned.

A USER ERROR has occurred: Error creating GenomicsDB workspace: /n/holyscratch01/edwards_lab/rafa/genomic_DBs/db_177 already exists

You should consider a different naming strategy for the DB file.

ADD REPLY • link 4.6 years ago by Arup Ghosh 3.2k

0

Entering edit mode

Just remove the directory "db_177" and try again. But make sure genomics_DBs directory has been created.

ADD REPLY • link 4.6 years ago by alhafidzhamdan • 0

score 1 · Answer 1 · 2020-10-13

I went through the GATK GenomicDBImport and this is the solution You can find all the useful information at the link below.

REFERENCE: https://gatk.broadinstitute.org/hc/en-us/articles/360036883491-GenomicsDBImport

IMPORTANT: "The --genomicsdb-workspace-path must point to a non-existent or empty directory."

Remeber to set the post as solved if you've get the correct answer ;)